scrape-do-mcp 0.1.6 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Abel
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README-ZH.md CHANGED
@@ -2,25 +2,40 @@
2
2
 
3
3
  [English Docs](./README.md) | 中文文档
4
4
 
5
- Scrape.do 网页抓取和 Google 搜索 MCP 服务器 - 支持反机器人保护
6
-
7
- ## 功能特点
8
-
9
- - **scrape_url**: 抓取任意网页并返回 Markdown 格式内容。自动绕过 Cloudflare、WAF、CAPTCHA 和反爬虫保护。支持 JavaScript 渲染页面。
10
- - **google_search**: 搜索 Google 并返回结构化的 SERP 结果 JSON。包含自然搜索结果、知识图谱、本地商家、新闻、相关问题(People Also Ask)等。
11
-
12
- ## 可用工具
13
-
14
- | 工具 | 描述 |
15
- |------|------|
16
- | `scrape_url` | 抓取任意网页并返回 Markdown 格式内容。自动绕过 Cloudflare、WAF、CAPTCHA 和反爬虫保护。支持 JavaScript 渲染页面。 |
17
- | `google_search` | 搜索 Google 并返回结构化的 SERP 结果 JSON。包含自然搜索结果、知识图谱、本地商家、新闻、相关问题(People Also Ask)、视频结果等。 |
5
+ 这是一个把 Scrape.do 官方文档中主要 API 能力封装成 MCP 工具的包:主抓取 API、Google Search API、Amazon Scraper API、Async API,以及 Proxy Mode 配置辅助工具。
6
+
7
+ 官方文档:https://scrape.do/documentation/
8
+
9
+ ## 覆盖范围
10
+
11
+ - `scrape_url`:主 Scrape.do 抓取 API,支持 JS 渲染、地理定位、会话保持、截图、ReturnJSON、浏览器交互、Cookie、Header 转发。
12
+ - `google_search`:结构化 Google 搜索 API,支持 `google_domain`、`location`、`uule`、`lr`、`cr`、`safe`、`nfpr`、`filter`、分页、原始 HTML。
13
+ - `amazon_product`:Amazon PDP 接口。
14
+ - `amazon_offer_listing`:Amazon 卖家报价接口。
15
+ - `amazon_search`:Amazon 搜索 / 类目结果接口。
16
+ - `amazon_raw_html`:Amazon 原始 HTML 接口。
17
+ - `async_create_job`、`async_get_job`、`async_get_task`、`async_list_jobs`、`async_cancel_job`、`async_get_account`:Async API。
18
+ - `proxy_mode_config`:生成 Proxy Mode 的连接信息和参数字符串,不会在工具输出里泄露你的 token。
19
+
20
+ ## 兼容性说明
21
+
22
+ - `scrape_url` 同时支持 MCP 友好的别名和官方参数名:
23
+ - `render_js` 或 `render`
24
+ - `super_proxy` 或 `super`
25
+ - `screenshot` 或 `screenShot`
26
+ - `google_search` 同时支持:
27
+ - `query` 或 `q`
28
+ - `country` 或 `gl`
29
+ - `language` 或 `hl`
30
+ - `domain` 或 `google_domain`
31
+ - `includeHtml` 或 `include_html`
32
+ - `scrape_url` 里的 Header 转发请使用 `headers` + `header_mode`(`custom` / `extra` / `forward`)。
33
+ - 截图结果会以 MCP 图片内容返回,而不是单纯的 base64 文本。
34
+ - `scrape_url` 在未启用 ReturnJSON 时默认使用 `output="markdown"`,更适合 LLM 读取;如果你想更贴近原始 HTTP API 的行为,请手动设置 `output="raw"`。
18
35
 
19
36
  ## 安装
20
37
 
21
- ### 快速安装(推荐)
22
-
23
- 在终端中运行以下命令:
38
+ ### 快速安装
24
39
 
25
40
  ```bash
26
41
  claude mcp add-json scrape-do --scope user '{
@@ -28,13 +43,11 @@ claude mcp add-json scrape-do --scope user '{
28
43
  "command": "npx",
29
44
  "args": ["-y", "scrape-do-mcp"],
30
45
  "env": {
31
- "SCRAPE_DO_TOKEN": "你的Token"
46
+ "SCRAPE_DO_TOKEN": "YOUR_TOKEN_HERE"
32
47
  }
33
48
  }'
34
49
  ```
35
50
 
36
- 将 `你的Token` 替换为你在 https://app.scrape.do 获取的 API Token。
37
-
38
51
  ### Claude Desktop
39
52
 
40
53
  添加到 `~/.claude.json`:
@@ -46,108 +59,57 @@ claude mcp add-json scrape-do --scope user '{
46
59
  "command": "npx",
47
60
  "args": ["-y", "scrape-do-mcp"],
48
61
  "env": {
49
- "SCRAPE_DO_TOKEN": "你的Token"
62
+ "SCRAPE_DO_TOKEN": "YOUR_TOKEN_HERE"
50
63
  }
51
64
  }
52
65
  }
53
66
  }
54
67
  ```
55
68
 
56
- 获取免费 API Token:https://app.scrape.do
57
-
58
- ## 使用方法
59
-
60
- ### scrape_url
69
+ Token 获取地址:https://app.scrape.do
61
70
 
62
- 抓取任意网页并获取 Markdown 内容。
63
-
64
- ```typescript
65
- // 参数
66
- {
67
- url: string, // 要抓取的网址
68
- render_js?: boolean, // 渲染 JavaScript(默认 false)
69
- super_proxy?: boolean, // 使用住宅代理(消耗 10 积分,默认 false)
70
- output?: "markdown" | "raw" // 输出格式(默认 markdown)
71
- }
72
- ```
73
-
74
- ### google_search
75
-
76
- 搜索 Google 并获取结构化结果。
77
-
78
- ```typescript
79
- // 参数
80
- {
81
- query: string, // 搜索关键词
82
- country?: string, // 国家代码(默认 "us")
83
- language?: string, // 界面语言(默认 "en")
84
- page?: number, // 页码(默认 1)
85
- time_period?: "" | "last_hour" | "last_day" | "last_week" | "last_month" | "last_year",
86
- device?: "desktop" | "mobile" // 设备类型(默认 desktop)
87
- }
88
- ```
89
-
90
- ## 使用示例
91
-
92
- ### 抓取网页
93
- ```
94
- 请抓取 https://github.com 并给我主要内容(Markdown 格式)。
95
- ```
71
+ ## 可用工具
96
72
 
97
- ### Google 搜索
98
- ```
99
- 搜索 "2026 年最佳 Python Web 框架",返回前 5 个结果。
73
+ | 工具 | 用途 |
74
+ |------|------|
75
+ | `scrape_url` | Scrape.do 抓取 API |
76
+ | `google_search` | 结构化 Google 搜索结果 |
77
+ | `amazon_product` | Amazon PDP 结构化数据 |
78
+ | `amazon_offer_listing` | Amazon 全量卖家报价 |
79
+ | `amazon_search` | Amazon 搜索 / 类目结果 |
80
+ | `amazon_raw_html` | Amazon 原始 HTML |
81
+ | `async_create_job` | 创建 Async API 任务 |
82
+ | `async_get_job` | 查询 Async job 详情 |
83
+ | `async_get_task` | 查询 Async task 详情 |
84
+ | `async_list_jobs` | 列出 Async jobs |
85
+ | `async_cancel_job` | 取消 Async job |
86
+ | `async_get_account` | 查询 Async 账户 / 并发信息 |
87
+ | `proxy_mode_config` | 生成 Proxy Mode 配置 |
88
+
89
+ ## 示例提示词
90
+
91
+ ```text
92
+ 抓取 https://example.com,开启 render=true,并等待 #app 出现。
100
93
  ```
101
94
 
102
- ### 带筛选条件的搜索
103
- ```
104
- 用中文搜索 "AI 新闻",限定为中国,过去一周的内容。
95
+ ```text
96
+ 搜索 "open source MCP servers",并设置 google_domain=google.co.uk 与 lr=lang_en。
105
97
  ```
106
98
 
107
- ### JavaScript 渲染
108
- ```
109
- 抓取这个 React 单页应用:https://example-spa.com
110
- 使用 render_js=true 获取完整渲染内容。
99
+ ```text
100
+ 获取 Amazon ASIN B0C7BKZ883 在美国 zipcode=10001 下的 PDP 数据。
111
101
  ```
112
102
 
113
- ### 获取原始 HTML
114
- ```
115
- 抓取 https://example.com 并返回原始 HTML 而不是 markdown。
103
+ ```text
104
+ 帮我为这 20 个 URL 创建一个异步抓取任务,并返回 job ID。
116
105
  ```
117
106
 
118
- ## 与其他工具对比
119
-
120
- | 功能 | scrape-do-mcp | Firecrawl | Browserbase |
121
- |------|--------------|-----------|-------------|
122
- | Google 搜索 | ✅ | ❌ | ❌ |
123
- | 免费积分 | 1,000 | 500 | 无 |
124
- | 价格 | 按量付费 | $19+/月 | $15+/月 |
125
- | MCP 原生 | ✅ | ✅ | ❌ |
126
- | 配置难度 | 无需配置 | 需要 API key | 需要 API key + 浏览器 |
127
-
128
- ### 为什么选择 scrape-do-mcp?
129
-
130
- - **零配置**:获取 Token 后即可立即使用
131
- - **一体化**:网页抓取和 Google 搜索集于一个 MCP
132
- - **反爬虫绕过**:自动处理 Cloudflare、WAF、CAPTCHA
133
- - **成本效益**:按需付费,免费额度可用
134
-
135
- ## 积分消耗
136
-
137
- | 工具 | 积分消耗 |
138
- |------|---------|
139
- | scrape_url(普通) | 1 积分/次 |
140
- | scrape_url(super_proxy) | 10 积分/次 |
141
- | google_search | 1 积分/次 |
142
-
143
- 注册即送 **1,000 积分**:https://app.scrape.do
144
-
145
107
  ## 开发
146
108
 
147
109
  ```bash
148
110
  npm install
149
111
  npm run build
150
- npm run dev # 开发模式运行
112
+ npm run dev
151
113
  ```
152
114
 
153
115
  ## 许可证
package/README.md CHANGED
@@ -2,25 +2,40 @@
2
2
 
3
3
  [中文文档](./README-ZH.md) | English
4
4
 
5
- MCP Server for Scrape.do - Web Scraping & Google Search with anti-bot bypass
6
-
7
- ## Features
8
-
9
- - **scrape_url**: Scrape any webpage and return content as Markdown. Automatically bypasses Cloudflare, WAFs, CAPTCHAs, and anti-bot protection. Supports JavaScript-rendered pages.
10
- - **google_search**: Search Google and return structured SERP results as JSON. Returns organic results, knowledge graph, local businesses, news stories, and more.
11
-
12
- ## Available Tools
13
-
14
- | Tool | Description |
15
- |------|-------------|
16
- | `scrape_url` | Scrape any webpage and return content as Markdown. Automatically bypasses Cloudflare, WAFs, CAPTCHAs, and anti-bot protection. Supports JavaScript-rendered pages. |
17
- | `google_search` | Search Google and return structured SERP results as JSON. Returns organic results, knowledge graph, local businesses, news stories, related questions (People Also Ask), video results, and more. |
5
+ An MCP server that wraps Scrape.do's documented APIs in one package: the main scraping API, Google Search API, Amazon Scraper API, Async API, and a Proxy Mode configuration helper.
6
+
7
+ Official docs: https://scrape.do/documentation/
8
+
9
+ ## Coverage
10
+
11
+ - `scrape_url`: Main Scrape.do API with JS rendering, geo-targeting, session persistence, screenshots, ReturnJSON, browser interactions, cookies, and header forwarding.
12
+ - `google_search`: Structured Google SERP API with `google_domain`, `location`, `uule`, `lr`, `cr`, `safe`, `nfpr`, `filter`, pagination, and optional raw HTML.
13
+ - `amazon_product`: Amazon PDP endpoint.
14
+ - `amazon_offer_listing`: Amazon offer listing endpoint.
15
+ - `amazon_search`: Amazon search/category endpoint.
16
+ - `amazon_raw_html`: Raw HTML Amazon endpoint with geo-targeting.
17
+ - `async_create_job`, `async_get_job`, `async_get_task`, `async_list_jobs`, `async_cancel_job`, `async_get_account`: Async API coverage.
18
+ - `proxy_mode_config`: Builds Proxy Mode connection details and parameter strings without exposing your token in tool output.
19
+
20
+ ## Compatibility Notes
21
+
22
+ - `scrape_url` supports both MCP-friendly aliases and official parameter names:
23
+ - `render_js` or `render`
24
+ - `super_proxy` or `super`
25
+ - `screenshot` or `screenShot`
26
+ - `google_search` supports:
27
+ - `query` or `q`
28
+ - `country` or `gl`
29
+ - `language` or `hl`
30
+ - `domain` or `google_domain`
31
+ - `includeHtml` or `include_html`
32
+ - For header forwarding in `scrape_url`, pass `headers` plus `header_mode` (`custom`, `extra`, or `forward`).
33
+ - Screenshot responses are returned as MCP image content instead of plain base64 text.
34
+ - `scrape_url` defaults to `output="markdown"` when ReturnJSON is not used so the tool stays LLM-friendly. Set `output="raw"` if you want the raw API-style output.
18
35
 
19
36
  ## Installation
20
37
 
21
- ### Quick Install (Recommended)
22
-
23
- Run this command in your terminal:
38
+ ### Quick Install
24
39
 
25
40
  ```bash
26
41
  claude mcp add-json scrape-do --scope user '{
@@ -33,11 +48,9 @@ claude mcp add-json scrape-do --scope user '{
33
48
  }'
34
49
  ```
35
50
 
36
- Replace `YOUR_TOKEN_HERE` with your Scrape.do API token from https://app.scrape.do
37
-
38
51
  ### Claude Desktop
39
52
 
40
- Add to your `~/.claude.json`:
53
+ Add this to `~/.claude.json`:
41
54
 
42
55
  ```json
43
56
  {
@@ -46,110 +59,57 @@ Add to your `~/.claude.json`:
46
59
  "command": "npx",
47
60
  "args": ["-y", "scrape-do-mcp"],
48
61
  "env": {
49
- "SCRAPE_DO_TOKEN": "your_token_here"
62
+ "SCRAPE_DO_TOKEN": "YOUR_TOKEN_HERE"
50
63
  }
51
64
  }
52
65
  }
53
66
  }
54
67
  ```
55
68
 
56
- Get your free API token at: https://app.scrape.do
57
-
58
- ## Usage
59
-
60
- ### scrape_url
69
+ Get your token at https://app.scrape.do
61
70
 
62
- Scrape any webpage and get content as Markdown.
63
-
64
- ```typescript
65
- // Parameters
66
- {
67
- url: string, // Target URL to scrape
68
- render_js?: boolean, // Render JavaScript (default: false)
69
- super_proxy?: boolean, // Use residential proxies (costs 10 credits, default: false)
70
- output?: "markdown" | "raw" // Output format (default: markdown)
71
- }
72
- ```
73
-
74
- ### google_search
75
-
76
- Search Google and get structured results.
71
+ ## Available Tools
77
72
 
78
- ```typescript
79
- // Parameters
80
- {
81
- query: string, // Search query
82
- country?: string, // Country code (default: "us")
83
- language?: string, // Interface language (default: "en")
84
- page?: number, // Page number (default: 1)
85
- time_period?: "" | "last_hour" | "last_day" | "last_week" | "last_month" | "last_year",
86
- device?: "desktop" | "mobile" // Device type (default: desktop)
87
- }
88
- ```
73
+ | Tool | Purpose |
74
+ |------|---------|
75
+ | `scrape_url` | Main Scrape.do scraping API wrapper |
76
+ | `google_search` | Structured Google search results |
77
+ | `amazon_product` | Amazon PDP structured data |
78
+ | `amazon_offer_listing` | Amazon seller offers |
79
+ | `amazon_search` | Amazon keyword/category results |
80
+ | `amazon_raw_html` | Raw Amazon HTML with geo-targeting |
81
+ | `async_create_job` | Create Async API jobs |
82
+ | `async_get_job` | Fetch Async job details |
83
+ | `async_get_task` | Fetch Async task details |
84
+ | `async_list_jobs` | List Async jobs |
85
+ | `async_cancel_job` | Cancel Async jobs |
86
+ | `async_get_account` | Fetch Async account/concurrency info |
87
+ | `proxy_mode_config` | Generate Proxy Mode configuration |
89
88
 
90
89
  ## Example Prompts
91
90
 
92
- Here are some prompts you can use to invoke the tools:
93
-
94
- ### Scrape a Website
95
- ```
96
- Please scrape https://github.com and give me the main content as markdown.
91
+ ```text
92
+ Scrape https://example.com with render=true and wait for #app.
97
93
  ```
98
94
 
99
- ### Search Google
100
- ```
101
- Search Google for "best Python web frameworks 2026" and return the top 5 results.
95
+ ```text
96
+ Search Google for "open source MCP servers" with google_domain=google.co.uk and lr=lang_en.
102
97
  ```
103
98
 
104
- ### Search with Filters
105
- ```
106
- Search for "AI news" in Chinese, from China, last week.
107
- ```
108
-
109
- ### JavaScript Rendering
110
- ```
111
- Scrape this React Single Page Application: https://example-spa.com
112
- Use render_js=true to get the fully rendered content.
99
+ ```text
100
+ Get the Amazon PDP for ASIN B0C7BKZ883 in the US with zipcode 10001.
113
101
  ```
114
102
 
115
- ### Get Raw HTML
103
+ ```text
104
+ Create an async job for these 20 URLs and give me the job ID.
116
105
  ```
117
- Scrape https://example.com and return raw HTML instead of markdown.
118
- ```
119
-
120
- ## Comparison with Alternatives
121
-
122
- | Feature | scrape-do-mcp | Firecrawl | Browserbase |
123
- |---------|--------------|-----------|-------------|
124
- | Google Search | ✅ | ❌ | ❌ |
125
- | Free Credits | 1,000 | 500 | None |
126
- | Pricing | Pay per use | $19+/mo | $15+/mo |
127
- | MCP Native | ✅ | ✅ | ❌ |
128
- | Setup Required | None | API key | API key + browser |
129
-
130
- ### Why scrape-do-mcp?
131
-
132
- - **Zero setup**: Just get a token and use immediately
133
- - **All-in-one**: Both web scraping AND Google search in one MCP
134
- - **Anti-bot bypass**: Automatically handles Cloudflare, WAFs, CAPTCHAs
135
- - **Cost-effective**: Pay only for what you use, free tier available
136
-
137
- ## Credit Usage
138
-
139
- | Tool | Credit Cost |
140
- |------|-------------|
141
- | scrape_url (regular) | 1 credit/request |
142
- | scrape_url (super_proxy) | 10 credits/request |
143
- | google_search | 1 credit/request |
144
-
145
- Free registration includes **1,000 credits**: https://app.scrape.do
146
106
 
147
107
  ## Development
148
108
 
149
109
  ```bash
150
110
  npm install
151
111
  npm run build
152
- npm run dev # Run in development mode
112
+ npm run dev
153
113
  ```
154
114
 
155
115
  ## License
package/dist/index.js CHANGED
@@ -4,97 +4,808 @@ var __importDefault = (this && this.__importDefault) || function (mod) {
4
4
  return (mod && mod.__esModule) ? mod : { "default": mod };
5
5
  };
6
6
  Object.defineProperty(exports, "__esModule", { value: true });
7
+ const axios_1 = __importDefault(require("axios"));
7
8
  const mcp_js_1 = require("@modelcontextprotocol/sdk/server/mcp.js");
8
9
  const stdio_js_1 = require("@modelcontextprotocol/sdk/server/stdio.js");
9
10
  const zod_1 = require("zod");
10
- const axios_1 = __importDefault(require("axios"));
11
+ const SERVER_VERSION = "0.3.0";
11
12
  const SCRAPE_DO_TOKEN = process.env.SCRAPE_DO_TOKEN || "";
12
13
  const SCRAPE_API_BASE = "https://api.scrape.do";
14
+ const ASYNC_API_BASE = "https://q.scrape.do";
15
+ const headerValueSchema = zod_1.z.union([zod_1.z.string(), zod_1.z.number(), zod_1.z.boolean()]);
16
+ const headerRecordSchema = zod_1.z.record(zod_1.z.string(), headerValueSchema);
17
+ const browserActionSchema = zod_1.z.record(zod_1.z.string(), zod_1.z.union([zod_1.z.string(), zod_1.z.number(), zod_1.z.boolean()]));
18
+ const headerModeSchema = zod_1.z.enum(["custom", "extra", "forward"]);
19
+ const scrapeWaitUntilSchema = zod_1.z.enum(["domcontentloaded", "load", "networkidle", "networkidle0", "networkidle2"]);
20
+ const asyncWaitUntilSchema = zod_1.z.enum(["domcontentloaded", "networkidle0", "networkidle2"]);
21
+ const googleTimePeriodSchema = zod_1.z.enum(["last_hour", "last_day", "last_week", "last_month", "last_year"]);
22
+ const asyncMethodSchema = zod_1.z.enum(["GET", "POST", "PUT", "PATCH", "HEAD", "DELETE"]);
13
23
  const server = new mcp_js_1.McpServer({
14
24
  name: "scrape-do-mcp",
15
- version: "0.1.3",
25
+ version: SERVER_VERSION,
16
26
  });
17
- // ─── Tool 1: scrape_url ──────────────────────────────────────────────────────
18
- server.tool("scrape_url", "Scrape any webpage and return its content as Markdown. Automatically bypasses Cloudflare, WAFs, CAPTCHAs, and anti-bot protection. Supports JavaScript-rendered pages.", {
19
- url: zod_1.z.string().url().describe("The target URL to scrape"),
20
- render_js: zod_1.z.boolean().optional().default(false).describe("Render JavaScript (use for React/Vue/SPA pages)"),
21
- super_proxy: zod_1.z.boolean().optional().default(false).describe("Use residential/mobile proxies for harder-to-detect requests (costs 10 credits instead of 1)"),
22
- output: zod_1.z.enum(["markdown", "raw"]).optional().default("markdown").describe("Output format: markdown (default) or raw HTML"),
23
- }, async ({ url, render_js, super_proxy, output }) => {
24
- if (!SCRAPE_DO_TOKEN) {
27
+ function isRecord(value) {
28
+ return typeof value === "object" && value !== null && !Array.isArray(value);
29
+ }
30
+ function compactObject(value) {
31
+ return Object.fromEntries(Object.entries(value).filter(([, entry]) => entry !== undefined));
32
+ }
33
+ function stringifyUnknown(value) {
34
+ if (typeof value === "string") {
35
+ return value;
36
+ }
37
+ if (value instanceof ArrayBuffer) {
38
+ return Buffer.from(value).toString("utf8");
39
+ }
40
+ if (Buffer.isBuffer(value)) {
41
+ return value.toString("utf8");
42
+ }
43
+ if (value === undefined || value === null) {
44
+ return "";
45
+ }
46
+ try {
47
+ return JSON.stringify(value, null, 2);
48
+ }
49
+ catch {
50
+ return String(value);
51
+ }
52
+ }
53
+ function tryParseJson(value) {
54
+ try {
55
+ return JSON.parse(value);
56
+ }
57
+ catch {
58
+ return undefined;
59
+ }
60
+ }
61
+ function createErrorResult(message) {
62
+ return {
63
+ content: [{ type: "text", text: message }],
64
+ isError: true,
65
+ };
66
+ }
67
+ function createTextResult(text, structuredContent) {
68
+ return {
69
+ content: [{ type: "text", text }],
70
+ ...(structuredContent ? { structuredContent } : {}),
71
+ };
72
+ }
73
+ function createJsonResult(value) {
74
+ if (isRecord(value)) {
25
75
  return {
26
- content: [{ type: "text", text: "Error: SCRAPE_DO_TOKEN is not set. Get your free token at https://app.scrape.do" }],
27
- isError: true,
76
+ content: [{ type: "text", text: JSON.stringify(value, null, 2) }],
77
+ structuredContent: value,
28
78
  };
29
79
  }
80
+ return createTextResult(JSON.stringify(value, null, 2));
81
+ }
82
+ function createImageResult(images, note) {
83
+ const content = [];
84
+ if (note) {
85
+ content.push({ type: "text", text: note });
86
+ }
87
+ for (const image of images) {
88
+ content.push({
89
+ type: "image",
90
+ data: image.data,
91
+ mimeType: image.mimeType,
92
+ });
93
+ }
94
+ return { content };
95
+ }
96
+ function getErrorMessage(error) {
97
+ if (axios_1.default.isAxiosError(error)) {
98
+ const responseData = error.response?.data;
99
+ if (responseData !== undefined) {
100
+ return stringifyUnknown(responseData);
101
+ }
102
+ return error.message;
103
+ }
104
+ if (error instanceof Error) {
105
+ return error.message;
106
+ }
107
+ return String(error);
108
+ }
109
+ async function requestText(config) {
110
+ const response = await axios_1.default.request({
111
+ ...config,
112
+ responseType: "text",
113
+ transformResponse: [(value) => value],
114
+ });
115
+ return {
116
+ text: stringifyUnknown(response.data),
117
+ headers: response.headers,
118
+ };
119
+ }
120
+ function normalizeHeaderRecord(value) {
121
+ if (!value) {
122
+ return undefined;
123
+ }
124
+ return Object.fromEntries(Object.entries(value).map(([key, entry]) => [key, String(entry)]));
125
+ }
126
+ function resolveHeaderMode(input) {
127
+ const modes = new Set();
128
+ if (input.customHeaders) {
129
+ modes.add("custom");
130
+ }
131
+ if (input.extraHeaders) {
132
+ modes.add("extra");
133
+ }
134
+ if (input.forwardHeaders) {
135
+ modes.add("forward");
136
+ }
137
+ const explicitMode = input.header_mode ?? input.headerMode;
138
+ if (explicitMode) {
139
+ modes.add(explicitMode);
140
+ }
141
+ if (modes.size > 1) {
142
+ throw new Error("Choose only one header mode: custom, extra, or forward.");
143
+ }
144
+ if (modes.size === 1) {
145
+ return [...modes][0];
146
+ }
147
+ if (input.headers) {
148
+ return "custom";
149
+ }
150
+ return undefined;
151
+ }
152
+ function buildForwardedHeaders(headers, mode) {
153
+ const normalizedHeaders = normalizeHeaderRecord(headers);
154
+ if (!normalizedHeaders) {
155
+ return undefined;
156
+ }
157
+ if (mode !== "extra") {
158
+ return normalizedHeaders;
159
+ }
160
+ return Object.fromEntries(Object.entries(normalizedHeaders).map(([key, value]) => [key.toLowerCase().startsWith("sd-") ? key : `sd-${key}`, value]));
161
+ }
162
+ function inferMimeTypeFromBase64(value) {
163
+ if (value.startsWith("iVBORw0KGgo")) {
164
+ return "image/png";
165
+ }
166
+ if (value.startsWith("/9j/")) {
167
+ return "image/jpeg";
168
+ }
169
+ if (value.startsWith("R0lGOD")) {
170
+ return "image/gif";
171
+ }
172
+ if (value.startsWith("UklGR")) {
173
+ return "image/webp";
174
+ }
175
+ if (value.startsWith("Qk0")) {
176
+ return "image/bmp";
177
+ }
178
+ return undefined;
179
+ }
180
+ function maybeImageMatch(value) {
181
+ const trimmedValue = value.trim();
182
+ const dataUriMatch = trimmedValue.match(/^data:(image\/[a-zA-Z0-9.+-]+);base64,([A-Za-z0-9+/=\s]+)$/);
183
+ if (dataUriMatch) {
184
+ return {
185
+ mimeType: dataUriMatch[1],
186
+ data: dataUriMatch[2].replace(/\s+/g, ""),
187
+ };
188
+ }
189
+ const normalizedValue = trimmedValue.replace(/\s+/g, "");
190
+ const mimeType = inferMimeTypeFromBase64(normalizedValue);
191
+ if (!mimeType || normalizedValue.length < 100) {
192
+ return undefined;
193
+ }
194
+ return {
195
+ mimeType,
196
+ data: normalizedValue,
197
+ };
198
+ }
199
+ function collectImageMatches(value, results = [], seen = new Set()) {
200
+ if (typeof value === "string") {
201
+ const match = maybeImageMatch(value);
202
+ if (match && !seen.has(match.data)) {
203
+ seen.add(match.data);
204
+ results.push(match);
205
+ }
206
+ return results;
207
+ }
208
+ if (Array.isArray(value)) {
209
+ for (const item of value) {
210
+ collectImageMatches(item, results, seen);
211
+ }
212
+ return results;
213
+ }
214
+ if (!isRecord(value)) {
215
+ return results;
216
+ }
217
+ const prioritizedKeys = ["screenShot", "screenshot", "fullScreenShot", "particularScreenShot", "image", "images"];
218
+ for (const key of prioritizedKeys) {
219
+ if (key in value) {
220
+ collectImageMatches(value[key], results, seen);
221
+ }
222
+ }
223
+ for (const [key, entry] of Object.entries(value)) {
224
+ if (!prioritizedKeys.includes(key)) {
225
+ collectImageMatches(entry, results, seen);
226
+ }
227
+ }
228
+ return results;
229
+ }
230
+ function buildProxyParameterString(params) {
231
+ if (!params) {
232
+ return "render=false";
233
+ }
234
+ const searchParams = new URLSearchParams();
235
+ for (const [key, value] of Object.entries(params)) {
236
+ searchParams.set(key, String(value));
237
+ }
238
+ return searchParams.toString();
239
+ }
240
+ function ensureToken() {
241
+ if (!SCRAPE_DO_TOKEN) {
242
+ throw new Error("SCRAPE_DO_TOKEN is not set. Get your token at https://app.scrape.do");
243
+ }
244
+ }
245
+ server.tool("scrape_url", "Scrape a webpage with the official Scrape.do API. Supports Markdown/raw output, JS rendering, screenshots, browser interactions, geo-targeting, header forwarding, session persistence, and ReturnJSON features.", {
246
+ url: zod_1.z.string().url().describe("The target URL to scrape"),
247
+ render_js: zod_1.z.boolean().optional().describe("Alias for render. Render JavaScript for SPA or dynamic pages."),
248
+ render: zod_1.z.boolean().optional().describe("Official Scrape.do render parameter."),
249
+ super_proxy: zod_1.z.boolean().optional().describe("Alias for super. Use residential/mobile proxies."),
250
+ super: zod_1.z.boolean().optional().describe("Official Scrape.do super parameter."),
251
+ geoCode: zod_1.z.string().optional().describe("Country code for geo-targeting."),
252
+ regionalGeoCode: zod_1.z.string().optional().describe("Regional geo-targeting code."),
253
+ device: zod_1.z.enum(["desktop", "mobile", "tablet"]).optional().default("desktop").describe("Device type to emulate."),
254
+ sessionId: zod_1.z.union([zod_1.z.number().int(), zod_1.z.string()]).optional().describe("Sticky session ID."),
255
+ timeout: zod_1.z.number().int().positive().optional().default(60000).describe("Maximum timeout in milliseconds."),
256
+ retryTimeout: zod_1.z.number().int().positive().optional().describe("Retry timeout in milliseconds."),
257
+ disableRetry: zod_1.z.boolean().optional().default(false).describe("Disable automatic retries."),
258
+ output: zod_1.z.enum(["markdown", "raw"]).optional().describe("Output format. MCP defaults to markdown unless ReturnJSON is used."),
259
+ returnJSON: zod_1.z.boolean().optional().default(false).describe("Return JSON with network requests/content."),
260
+ transparentResponse: zod_1.z.boolean().optional().default(false).describe("Return the target response without Scrape.do post-processing."),
261
+ screenshot: zod_1.z.boolean().optional().describe("Alias for screenShot. Capture a viewport screenshot."),
262
+ screenShot: zod_1.z.boolean().optional().describe("Official Scrape.do screenshot parameter."),
263
+ fullScreenShot: zod_1.z.boolean().optional().default(false).describe("Capture a full-page screenshot."),
264
+ particularScreenShot: zod_1.z.string().optional().describe("Capture a screenshot of a specific CSS selector."),
265
+ playWithBrowser: zod_1.z.array(browserActionSchema).optional().describe("Browser interaction script for Scrape.do."),
266
+ waitSelector: zod_1.z.string().optional().describe("CSS selector to wait for."),
267
+ customWait: zod_1.z.number().int().min(0).optional().describe("Additional wait time after load in milliseconds."),
268
+ waitUntil: scrapeWaitUntilSchema.optional().default("domcontentloaded").describe("Browser load event to wait for."),
269
+ width: zod_1.z.number().int().positive().optional().default(1920).describe("Viewport width."),
270
+ height: zod_1.z.number().int().positive().optional().default(1080).describe("Viewport height."),
271
+ blockResources: zod_1.z.boolean().optional().default(true).describe("Block CSS, images, and fonts."),
272
+ showFrames: zod_1.z.boolean().optional().default(false).describe("Include iframe content in ReturnJSON responses."),
273
+ showWebsocketRequests: zod_1.z.boolean().optional().default(false).describe("Include websocket requests in ReturnJSON responses."),
274
+ headers: headerRecordSchema.optional().describe("Header values to forward to Scrape.do for custom/extra/forward modes."),
275
+ header_mode: headerModeSchema.optional().describe("Header forwarding mode: custom, extra, or forward."),
276
+ headerMode: headerModeSchema.optional().describe("CamelCase alias for header_mode."),
277
+ customHeaders: zod_1.z.boolean().optional().describe("Enable official customHeaders mode."),
278
+ extraHeaders: zod_1.z.boolean().optional().describe("Enable official extraHeaders mode."),
279
+ forwardHeaders: zod_1.z.boolean().optional().describe("Enable official forwardHeaders mode."),
280
+ setCookies: zod_1.z.string().optional().describe("Cookies to send to the target page."),
281
+ pureCookies: zod_1.z.boolean().optional().default(false).describe("Return original Set-Cookie headers."),
282
+ disableRedirection: zod_1.z.boolean().optional().default(false).describe("Disable redirect following."),
283
+ callback: zod_1.z.string().url().optional().describe("Webhook callback URL."),
284
+ }, async (params) => {
285
+ try {
286
+ ensureToken();
287
+ const screenshotRequested = (params.screenshot ?? params.screenShot ?? false) || params.fullScreenShot || Boolean(params.particularScreenShot);
288
+ const interactionRequested = Boolean(params.playWithBrowser?.length);
289
+ const screenshotModeCount = [params.screenshot ?? params.screenShot ?? false, params.fullScreenShot, Boolean(params.particularScreenShot)].filter(Boolean).length;
290
+ if (screenshotModeCount > 1) {
291
+ return createErrorResult("Use only one screenshot mode at a time: screenShot, fullScreenShot, or particularScreenShot.");
292
+ }
293
+ if (params.particularScreenShot && interactionRequested) {
294
+ return createErrorResult("particularScreenShot cannot be used together with playWithBrowser.");
295
+ }
296
+ const headerMode = resolveHeaderMode(params);
297
+ const effectiveRender = (params.render_js ?? params.render ?? false) || params.returnJSON || params.showFrames || params.showWebsocketRequests || screenshotRequested || interactionRequested;
298
+ const effectiveReturnJSON = params.returnJSON || params.showFrames || params.showWebsocketRequests || screenshotRequested || interactionRequested;
299
+ const effectiveBlockResources = screenshotRequested || interactionRequested ? false : params.blockResources;
300
+ const effectiveOutput = effectiveReturnJSON ? params.output : params.output ?? "markdown";
301
+ const requestParams = compactObject({
302
+ token: SCRAPE_DO_TOKEN,
303
+ url: params.url,
304
+ render: effectiveRender || undefined,
305
+ super: params.super_proxy ?? params.super,
306
+ geoCode: params.geoCode,
307
+ regionalGeoCode: params.regionalGeoCode,
308
+ device: params.device !== "desktop" ? params.device : undefined,
309
+ sessionId: params.sessionId,
310
+ timeout: params.timeout !== 60000 ? params.timeout : undefined,
311
+ retryTimeout: params.retryTimeout,
312
+ disableRetry: params.disableRetry || undefined,
313
+ output: effectiveOutput,
314
+ returnJSON: effectiveReturnJSON || undefined,
315
+ transparentResponse: params.transparentResponse || undefined,
316
+ screenShot: (params.screenshot ?? params.screenShot ?? false) || undefined,
317
+ fullScreenShot: params.fullScreenShot || undefined,
318
+ particularScreenShot: params.particularScreenShot,
319
+ playWithBrowser: params.playWithBrowser?.length ? JSON.stringify(params.playWithBrowser) : undefined,
320
+ waitSelector: params.waitSelector,
321
+ customWait: params.customWait,
322
+ waitUntil: params.waitUntil !== "domcontentloaded" ? params.waitUntil : undefined,
323
+ width: params.width !== 1920 ? params.width : undefined,
324
+ height: params.height !== 1080 ? params.height : undefined,
325
+ blockResources: effectiveBlockResources === false ? false : undefined,
326
+ showFrames: params.showFrames || undefined,
327
+ showWebsocketRequests: params.showWebsocketRequests || undefined,
328
+ customHeaders: headerMode === "custom" || params.customHeaders ? true : undefined,
329
+ extraHeaders: headerMode === "extra" || params.extraHeaders ? true : undefined,
330
+ forwardHeaders: headerMode === "forward" || params.forwardHeaders ? true : undefined,
331
+ setCookies: params.setCookies,
332
+ pureCookies: params.pureCookies || undefined,
333
+ disableRedirection: params.disableRedirection || undefined,
334
+ callback: params.callback,
335
+ });
336
+ const headers = buildForwardedHeaders(params.headers, headerMode);
337
+ const { text } = await requestText({
338
+ method: "GET",
339
+ url: SCRAPE_API_BASE,
340
+ params: requestParams,
341
+ headers,
342
+ timeout: Math.min(params.timeout ?? 60000, 120000),
343
+ });
344
+ const parsed = tryParseJson(text);
345
+ const images = screenshotRequested || interactionRequested ? collectImageMatches(parsed ?? text) : [];
346
+ if (images.length > 0) {
347
+ const note = images.length === 1 ? "Captured screenshot from Scrape.do." : `Captured ${images.length} screenshots from Scrape.do.`;
348
+ return createImageResult(images, note);
349
+ }
350
+ if (parsed !== undefined) {
351
+ return createJsonResult(parsed);
352
+ }
353
+ return createTextResult(text);
354
+ }
355
+ catch (error) {
356
+ return createErrorResult(`Error: ${getErrorMessage(error)}`);
357
+ }
358
+ });
359
+ server.tool("google_search", "Search Google with Scrape.do's structured SERP API. Supports localization, google_domain, UULE/location targeting, filters, pagination, and optional raw HTML.", {
360
+ query: zod_1.z.string().optional().describe("Alias for q. Search query."),
361
+ q: zod_1.z.string().optional().describe("Official Google Search query parameter."),
362
+ country: zod_1.z.string().optional().default("us").describe("Alias for gl. Country code."),
363
+ gl: zod_1.z.string().optional().describe("Official Google geo-location parameter."),
364
+ language: zod_1.z.string().optional().default("en").describe("Alias for hl. Interface language."),
365
+ hl: zod_1.z.string().optional().describe("Official Google interface language parameter."),
366
+ domain: zod_1.z.string().optional().describe("Deprecated alias for google_domain."),
367
+ google_domain: zod_1.z.string().optional().describe("Official Google domain parameter."),
368
+ page: zod_1.z.number().int().positive().optional().default(1).describe("1-based page number."),
369
+ start: zod_1.z.number().int().min(0).optional().describe("Official Google result offset. Overrides page."),
370
+ num: zod_1.z.number().int().positive().optional().describe("Number of results per page."),
371
+ time_period: googleTimePeriodSchema.optional().describe("Time-based search filter."),
372
+ device: zod_1.z.enum(["desktop", "mobile"]).optional().default("desktop").describe("SERP layout device."),
373
+ includeHtml: zod_1.z.boolean().optional().describe("Alias for include_html."),
374
+ include_html: zod_1.z.boolean().optional().describe("Include raw Google HTML in the response."),
375
+ location: zod_1.z.string().optional().describe("Canonical Google location string."),
376
+ uule: zod_1.z.string().optional().describe("UULE-encoded location string."),
377
+ lr: zod_1.z.string().optional().describe("Strict language filter such as lang_en."),
378
+ cr: zod_1.z.string().optional().describe("Strict country filter such as countryUS."),
379
+ safe: zod_1.z.string().optional().describe("SafeSearch mode. Use active to filter adult content."),
380
+ nfpr: zod_1.z.boolean().optional().describe("Disable spelling correction."),
381
+ filter: zod_1.z.union([zod_1.z.string(), zod_1.z.number()]).optional().describe("Result filtering control. Use 0 to disable similar/omitted result filtering."),
382
+ }, async (params) => {
383
+ try {
384
+ ensureToken();
385
+ const query = params.query ?? params.q;
386
+ if (!query) {
387
+ return createErrorResult("Error: query or q is required.");
388
+ }
389
+ const start = params.start ?? Math.max((params.page - 1) * (params.num ?? 10), 0);
390
+ const requestParams = compactObject({
391
+ token: SCRAPE_DO_TOKEN,
392
+ q: query,
393
+ gl: params.gl ?? params.country,
394
+ hl: params.hl ?? params.language,
395
+ google_domain: params.google_domain ?? params.domain,
396
+ start,
397
+ num: params.num,
398
+ time_period: params.time_period,
399
+ device: params.device,
400
+ include_html: params.include_html ?? params.includeHtml ? true : undefined,
401
+ location: params.location,
402
+ uule: params.uule,
403
+ lr: params.lr,
404
+ cr: params.cr,
405
+ safe: params.safe,
406
+ nfpr: params.nfpr,
407
+ filter: params.filter,
408
+ });
409
+ const { text } = await requestText({
410
+ method: "GET",
411
+ url: `${SCRAPE_API_BASE}/plugin/google/search`,
412
+ params: requestParams,
413
+ timeout: 60000,
414
+ });
415
+ const parsed = tryParseJson(text);
416
+ if (parsed !== undefined) {
417
+ return createJsonResult(parsed);
418
+ }
419
+ return createTextResult(text);
420
+ }
421
+ catch (error) {
422
+ return createErrorResult(`Error: ${getErrorMessage(error)}`);
423
+ }
424
+ });
425
+ server.tool("amazon_product", "Get structured Amazon product detail data with the official Scrape.do Amazon PDP API.", {
426
+ asin: zod_1.z.string().min(1).describe("Amazon ASIN."),
427
+ geocode: zod_1.z.string().min(1).describe("Amazon marketplace country code."),
428
+ zipcode: zod_1.z.string().min(1).describe("ZIP/postal code for geo-targeting."),
429
+ super_proxy: zod_1.z.boolean().optional().describe("Alias for super."),
430
+ super: zod_1.z.boolean().optional().describe("Official Amazon super proxy flag."),
431
+ language: zod_1.z.string().optional().describe("ISO 639-1 language code."),
432
+ includeHtml: zod_1.z.boolean().optional().describe("Alias for include_html."),
433
+ include_html: zod_1.z.boolean().optional().describe("Include raw HTML in the JSON response."),
434
+ }, async (params) => {
435
+ try {
436
+ ensureToken();
437
+ const requestParams = compactObject({
438
+ token: SCRAPE_DO_TOKEN,
439
+ asin: params.asin,
440
+ geocode: params.geocode,
441
+ zipcode: params.zipcode,
442
+ super: params.super_proxy ?? params.super,
443
+ language: params.language,
444
+ include_html: params.include_html ?? params.includeHtml ? true : undefined,
445
+ });
446
+ const { text } = await requestText({
447
+ method: "GET",
448
+ url: `${SCRAPE_API_BASE}/plugin/amazon/pdp`,
449
+ params: requestParams,
450
+ timeout: 60000,
451
+ });
452
+ const parsed = tryParseJson(text);
453
+ if (parsed !== undefined) {
454
+ return createJsonResult(parsed);
455
+ }
456
+ return createTextResult(text);
457
+ }
458
+ catch (error) {
459
+ return createErrorResult(`Error: ${getErrorMessage(error)}`);
460
+ }
461
+ });
462
+ server.tool("amazon_offer_listing", "Get all seller offers for an Amazon product with structured pricing, fulfillment, and Buy Box data.", {
463
+ asin: zod_1.z.string().min(1).describe("Amazon ASIN."),
464
+ geocode: zod_1.z.string().min(1).describe("Amazon marketplace country code."),
465
+ zipcode: zod_1.z.string().min(1).describe("ZIP/postal code for geo-targeting."),
466
+ super_proxy: zod_1.z.boolean().optional().describe("Alias for super."),
467
+ super: zod_1.z.boolean().optional().describe("Official Amazon super proxy flag."),
468
+ includeHtml: zod_1.z.boolean().optional().describe("Alias for include_html."),
469
+ include_html: zod_1.z.boolean().optional().describe("Include raw HTML in the JSON response."),
470
+ }, async (params) => {
471
+ try {
472
+ ensureToken();
473
+ const requestParams = compactObject({
474
+ token: SCRAPE_DO_TOKEN,
475
+ asin: params.asin,
476
+ geocode: params.geocode,
477
+ zipcode: params.zipcode,
478
+ super: params.super_proxy ?? params.super,
479
+ include_html: params.include_html ?? params.includeHtml ? true : undefined,
480
+ });
481
+ const { text } = await requestText({
482
+ method: "GET",
483
+ url: `${SCRAPE_API_BASE}/plugin/amazon/offer-listing`,
484
+ params: requestParams,
485
+ timeout: 60000,
486
+ });
487
+ const parsed = tryParseJson(text);
488
+ if (parsed !== undefined) {
489
+ return createJsonResult(parsed);
490
+ }
491
+ return createTextResult(text);
492
+ }
493
+ catch (error) {
494
+ return createErrorResult(`Error: ${getErrorMessage(error)}`);
495
+ }
496
+ });
497
+ server.tool("amazon_search", "Search Amazon or scrape Amazon category-style result pages with structured product listings.", {
498
+ keyword: zod_1.z.string().min(1).describe("Amazon keyword query."),
499
+ geocode: zod_1.z.string().min(1).describe("Amazon marketplace country code."),
500
+ zipcode: zod_1.z.string().min(1).describe("ZIP/postal code for geo-targeting."),
501
+ page: zod_1.z.number().int().positive().optional().default(1).describe("Page number."),
502
+ super_proxy: zod_1.z.boolean().optional().describe("Alias for super."),
503
+ super: zod_1.z.boolean().optional().describe("Official Amazon super proxy flag."),
504
+ language: zod_1.z.string().optional().describe("ISO 639-1 language code."),
505
+ includeHtml: zod_1.z.boolean().optional().describe("Alias for include_html."),
506
+ include_html: zod_1.z.boolean().optional().describe("Include raw HTML in the JSON response."),
507
+ }, async (params) => {
508
+ try {
509
+ ensureToken();
510
+ const requestParams = compactObject({
511
+ token: SCRAPE_DO_TOKEN,
512
+ keyword: params.keyword,
513
+ geocode: params.geocode,
514
+ zipcode: params.zipcode,
515
+ page: params.page !== 1 ? params.page : undefined,
516
+ super: params.super_proxy ?? params.super,
517
+ language: params.language,
518
+ include_html: params.include_html ?? params.includeHtml ? true : undefined,
519
+ });
520
+ const { text } = await requestText({
521
+ method: "GET",
522
+ url: `${SCRAPE_API_BASE}/plugin/amazon/search`,
523
+ params: requestParams,
524
+ timeout: 60000,
525
+ });
526
+ const parsed = tryParseJson(text);
527
+ if (parsed !== undefined) {
528
+ return createJsonResult(parsed);
529
+ }
530
+ return createTextResult(text);
531
+ }
532
+ catch (error) {
533
+ return createErrorResult(`Error: ${getErrorMessage(error)}`);
534
+ }
535
+ });
536
+ server.tool("amazon_raw_html", "Get raw HTML from any Amazon URL with ZIP-code geo-targeting.", {
537
+ url: zod_1.z.string().url().describe("Full Amazon URL to scrape."),
538
+ geocode: zod_1.z.string().min(1).describe("Amazon marketplace country code."),
539
+ zipcode: zod_1.z.string().min(1).describe("ZIP/postal code for geo-targeting."),
540
+ super_proxy: zod_1.z.boolean().optional().describe("Alias for super."),
541
+ super: zod_1.z.boolean().optional().describe("Official Amazon super proxy flag."),
542
+ language: zod_1.z.string().optional().describe("ISO 639-1 language code."),
543
+ timeout: zod_1.z.number().int().positive().optional().describe("Request timeout in milliseconds."),
544
+ }, async (params) => {
545
+ try {
546
+ ensureToken();
547
+ const requestParams = compactObject({
548
+ token: SCRAPE_DO_TOKEN,
549
+ url: params.url,
550
+ geocode: params.geocode,
551
+ zipcode: params.zipcode,
552
+ output: "html",
553
+ super: params.super_proxy ?? params.super,
554
+ language: params.language,
555
+ timeout: params.timeout,
556
+ });
557
+ const { text } = await requestText({
558
+ method: "GET",
559
+ url: `${SCRAPE_API_BASE}/plugin/amazon/`,
560
+ params: requestParams,
561
+ timeout: params.timeout ?? 60000,
562
+ });
563
+ return createTextResult(text);
564
+ }
565
+ catch (error) {
566
+ return createErrorResult(`Error: ${getErrorMessage(error)}`);
567
+ }
568
+ });
569
+ server.tool("async_create_job", "Create a Scrape.do Async API job for batch/background scraping.", {
570
+ targets: zod_1.z.array(zod_1.z.string().url()).min(1).describe("URLs to scrape."),
571
+ method: asyncMethodSchema.optional().default("GET").describe("HTTP method for the job."),
572
+ body: zod_1.z.string().optional().describe("Request body for POST/PUT/PATCH jobs."),
573
+ geoCode: zod_1.z.string().optional().describe("Country code."),
574
+ regionalGeoCode: zod_1.z.string().optional().describe("Regional code."),
575
+ super_proxy: zod_1.z.boolean().optional().describe("Use residential/mobile proxies."),
576
+ headers: headerRecordSchema.optional().describe("Headers to send with the upstream request."),
577
+ forwardHeaders: zod_1.z.boolean().optional().describe("Use only provided headers instead of merging with Scrape.do headers."),
578
+ sessionId: zod_1.z.union([zod_1.z.number().int(), zod_1.z.string()]).optional().describe("Sticky session ID."),
579
+ device: zod_1.z.enum(["desktop", "mobile", "tablet"]).optional().describe("Device type."),
580
+ setCookies: zod_1.z.string().optional().describe("Cookies to include."),
581
+ timeout: zod_1.z.number().int().positive().optional().describe("Request timeout in milliseconds."),
582
+ retryTimeout: zod_1.z.number().int().positive().optional().describe("Retry timeout in milliseconds."),
583
+ disableRetry: zod_1.z.boolean().optional().describe("Disable automatic retries."),
584
+ transparentResponse: zod_1.z.boolean().optional().describe("Return raw target response."),
585
+ disableRedirection: zod_1.z.boolean().optional().describe("Disable redirects."),
586
+ output: zod_1.z.enum(["raw", "markdown"]).optional().describe("Output format."),
587
+ render: zod_1.z
588
+ .object({
589
+ blockResources: zod_1.z.boolean().optional(),
590
+ waitUntil: asyncWaitUntilSchema.optional(),
591
+ customWait: zod_1.z.number().int().min(0).max(35000).optional(),
592
+ waitSelector: zod_1.z.string().optional(),
593
+ playWithBrowser: zod_1.z.array(browserActionSchema).optional(),
594
+ returnJSON: zod_1.z.boolean().optional(),
595
+ showWebsocketRequests: zod_1.z.boolean().optional(),
596
+ showFrames: zod_1.z.boolean().optional(),
597
+ screenshot: zod_1.z.boolean().optional(),
598
+ fullScreenshot: zod_1.z.boolean().optional(),
599
+ particularScreenshot: zod_1.z.string().optional(),
600
+ })
601
+ .optional()
602
+ .describe("Headless browser configuration."),
603
+ webhookUrl: zod_1.z.string().url().optional().describe("Webhook URL to receive results."),
604
+ webhookHeaders: headerRecordSchema.optional().describe("Extra headers for the webhook request."),
605
+ }, async (params) => {
606
+ try {
607
+ ensureToken();
608
+ const render = params.render
609
+ ? compactObject({
610
+ BlockResources: params.render.blockResources,
611
+ WaitUntil: params.render.waitUntil,
612
+ CustomWait: params.render.customWait,
613
+ WaitSelector: params.render.waitSelector,
614
+ PlayWithBrowser: params.render.playWithBrowser,
615
+ ReturnJSON: params.render.returnJSON,
616
+ ShowWebsocketRequests: params.render.showWebsocketRequests,
617
+ ShowFrames: params.render.showFrames,
618
+ Screenshot: params.render.screenshot,
619
+ FullScreenshot: params.render.fullScreenshot,
620
+ ParticularScreenshot: params.render.particularScreenshot,
621
+ })
622
+ : undefined;
623
+ const body = compactObject({
624
+ Targets: params.targets,
625
+ Method: params.method,
626
+ Body: params.body,
627
+ GeoCode: params.geoCode,
628
+ RegionalGeoCode: params.regionalGeoCode,
629
+ Super: params.super_proxy,
630
+ Headers: normalizeHeaderRecord(params.headers),
631
+ ForwardHeaders: params.forwardHeaders,
632
+ SessionID: params.sessionId !== undefined ? String(params.sessionId) : undefined,
633
+ Device: params.device,
634
+ SetCookies: params.setCookies,
635
+ Timeout: params.timeout,
636
+ RetryTimeout: params.retryTimeout,
637
+ DisableRetry: params.disableRetry,
638
+ TransparentResponse: params.transparentResponse,
639
+ DisableRedirection: params.disableRedirection,
640
+ Output: params.output,
641
+ Render: render && Object.keys(render).length > 0 ? render : undefined,
642
+ WebhookURL: params.webhookUrl,
643
+ WebhookHeaders: normalizeHeaderRecord(params.webhookHeaders),
644
+ });
645
+ const { text } = await requestText({
646
+ method: "POST",
647
+ url: `${ASYNC_API_BASE}/api/v1/jobs`,
648
+ headers: {
649
+ "Content-Type": "application/json",
650
+ "X-Token": SCRAPE_DO_TOKEN,
651
+ },
652
+ data: body,
653
+ timeout: 60000,
654
+ });
655
+ const parsed = tryParseJson(text);
656
+ if (parsed !== undefined) {
657
+ return createJsonResult(parsed);
658
+ }
659
+ return createTextResult(text);
660
+ }
661
+ catch (error) {
662
+ return createErrorResult(`Error: ${getErrorMessage(error)}`);
663
+ }
664
+ });
665
+ server.tool("async_get_job", "Get Scrape.do Async API job details by job ID.", {
666
+ jobId: zod_1.z.string().min(1).describe("Job ID returned by async_create_job."),
667
+ }, async ({ jobId }) => {
30
668
  try {
31
- const response = await axios_1.default.get(SCRAPE_API_BASE, {
669
+ ensureToken();
670
+ const { text } = await requestText({
671
+ method: "GET",
672
+ url: `${ASYNC_API_BASE}/api/v1/jobs/${encodeURIComponent(jobId)}`,
673
+ headers: {
674
+ "X-Token": SCRAPE_DO_TOKEN,
675
+ },
676
+ timeout: 60000,
677
+ });
678
+ const parsed = tryParseJson(text);
679
+ if (parsed !== undefined) {
680
+ return createJsonResult(parsed);
681
+ }
682
+ return createTextResult(text);
683
+ }
684
+ catch (error) {
685
+ return createErrorResult(`Error: ${getErrorMessage(error)}`);
686
+ }
687
+ });
688
+ server.tool("async_get_task", "Get Scrape.do Async API task details by job ID and task ID.", {
689
+ jobId: zod_1.z.string().min(1).describe("Job ID."),
690
+ taskId: zod_1.z.string().min(1).describe("Task ID."),
691
+ }, async ({ jobId, taskId }) => {
692
+ try {
693
+ ensureToken();
694
+ const { text } = await requestText({
695
+ method: "GET",
696
+ url: `${ASYNC_API_BASE}/api/v1/jobs/${encodeURIComponent(jobId)}/${encodeURIComponent(taskId)}`,
697
+ headers: {
698
+ "X-Token": SCRAPE_DO_TOKEN,
699
+ },
700
+ timeout: 60000,
701
+ });
702
+ const parsed = tryParseJson(text);
703
+ if (parsed !== undefined) {
704
+ return createJsonResult(parsed);
705
+ }
706
+ return createTextResult(text);
707
+ }
708
+ catch (error) {
709
+ return createErrorResult(`Error: ${getErrorMessage(error)}`);
710
+ }
711
+ });
712
+ server.tool("async_list_jobs", "List Scrape.do Async API jobs with pagination.", {
713
+ page: zod_1.z.number().int().positive().optional().default(1).describe("Page number."),
714
+ pageSize: zod_1.z.number().int().positive().max(100).optional().default(10).describe("Items per page."),
715
+ }, async ({ page, pageSize }) => {
716
+ try {
717
+ ensureToken();
718
+ const { text } = await requestText({
719
+ method: "GET",
720
+ url: `${ASYNC_API_BASE}/api/v1/jobs`,
32
721
  params: {
33
- token: SCRAPE_DO_TOKEN,
34
- url,
35
- render: render_js,
36
- super: super_proxy,
37
- output,
722
+ page,
723
+ page_size: pageSize,
724
+ },
725
+ headers: {
726
+ "X-Token": SCRAPE_DO_TOKEN,
38
727
  },
39
728
  timeout: 60000,
40
729
  });
41
- return {
42
- content: [{ type: "text", text: response.data }],
43
- };
730
+ const parsed = tryParseJson(text);
731
+ if (parsed !== undefined) {
732
+ return createJsonResult(parsed);
733
+ }
734
+ return createTextResult(text);
44
735
  }
45
736
  catch (error) {
46
- const msg = error.response?.data || error.message;
47
- return {
48
- content: [{ type: "text", text: `Error: ${msg}` }],
49
- isError: true,
50
- };
737
+ return createErrorResult(`Error: ${getErrorMessage(error)}`);
51
738
  }
52
739
  });
53
- // ─── Tool 2: google_search ───────────────────────────────────────────────────
54
- server.tool("google_search", "Search Google and return structured SERP results as JSON. Returns organic results, knowledge graph, local businesses, news stories, related questions (People Also Ask), video results, and more.", {
55
- query: zod_1.z.string().describe("Search query, e.g. 'best python frameworks 2026'"),
56
- country: zod_1.z.string().optional().default("us").describe("Country code for results, e.g. 'us', 'cn', 'gb', 'jp'"),
57
- language: zod_1.z.string().optional().default("en").describe("Interface language, e.g. 'en', 'zh', 'ja', 'de'"),
58
- page: zod_1.z.number().optional().default(1).describe("Page number (1 = first page, 2 = second page)"),
59
- time_period: zod_1.z.enum(["", "last_hour", "last_day", "last_week", "last_month", "last_year"]).optional().default("").describe("Filter results by time period"),
60
- device: zod_1.z.enum(["desktop", "mobile"]).optional().default("desktop").describe("Device type affecting SERP layout"),
61
- }, async ({ query, country, language, page, time_period, device }) => {
62
- if (!SCRAPE_DO_TOKEN) {
63
- return {
64
- content: [{ type: "text", text: "Error: SCRAPE_DO_TOKEN is not set. Get your free token at https://app.scrape.do" }],
65
- isError: true,
66
- };
740
+ server.tool("async_cancel_job", "Cancel a Scrape.do Async API job.", {
741
+ jobId: zod_1.z.string().min(1).describe("Job ID to cancel."),
742
+ }, async ({ jobId }) => {
743
+ try {
744
+ ensureToken();
745
+ const { text } = await requestText({
746
+ method: "DELETE",
747
+ url: `${ASYNC_API_BASE}/api/v1/jobs/${encodeURIComponent(jobId)}`,
748
+ headers: {
749
+ "X-Token": SCRAPE_DO_TOKEN,
750
+ },
751
+ timeout: 60000,
752
+ });
753
+ const parsed = tryParseJson(text);
754
+ if (parsed !== undefined) {
755
+ return createJsonResult(parsed);
756
+ }
757
+ return createTextResult(text);
67
758
  }
759
+ catch (error) {
760
+ return createErrorResult(`Error: ${getErrorMessage(error)}`);
761
+ }
762
+ });
763
+ server.tool("async_get_account", "Get Scrape.do Async API account/concurrency information.", {}, async () => {
68
764
  try {
69
- const params = {
70
- token: SCRAPE_DO_TOKEN,
71
- q: query,
72
- gl: country,
73
- hl: language,
74
- start: (page - 1) * 10,
75
- device,
76
- };
77
- if (time_period)
78
- params.time_period = time_period;
79
- const response = await axios_1.default.get(`${SCRAPE_API_BASE}/plugin/google/search`, {
80
- params,
765
+ ensureToken();
766
+ const { text } = await requestText({
767
+ method: "GET",
768
+ url: `${ASYNC_API_BASE}/api/v1/me`,
769
+ headers: {
770
+ "X-Token": SCRAPE_DO_TOKEN,
771
+ },
81
772
  timeout: 60000,
82
773
  });
83
- return {
84
- content: [{ type: "text", text: JSON.stringify(response.data, null, 2) }],
85
- };
774
+ const parsed = tryParseJson(text);
775
+ if (parsed !== undefined) {
776
+ return createJsonResult(parsed);
777
+ }
778
+ return createTextResult(text);
86
779
  }
87
780
  catch (error) {
88
- const msg = error.response?.data || error.message;
89
- return {
90
- content: [{ type: "text", text: `Error: ${msg}` }],
91
- isError: true,
92
- };
781
+ return createErrorResult(`Error: ${getErrorMessage(error)}`);
782
+ }
783
+ });
784
+ server.tool("proxy_mode_config", "Generate Scrape.do Proxy Mode configuration and parameter strings without exposing your configured token.", {
785
+ params: headerRecordSchema.optional().describe("Proxy mode query parameters to place into the password segment."),
786
+ }, async ({ params }) => {
787
+ try {
788
+ ensureToken();
789
+ const parameterString = buildProxyParameterString(params);
790
+ return createJsonResult({
791
+ protocol: "http or https",
792
+ host: "proxy.scrape.do",
793
+ port: 8080,
794
+ username: "<SCRAPE_DO_TOKEN>",
795
+ password: parameterString,
796
+ proxy_url_template: `http://<SCRAPE_DO_TOKEN>:${parameterString}@proxy.scrape.do:8080`,
797
+ ca_certificate_url: "https://scrape.do/scrapedo_ca.crt",
798
+ });
799
+ }
800
+ catch (error) {
801
+ return createErrorResult(`Error: ${getErrorMessage(error)}`);
93
802
  }
94
803
  });
95
- // ─── Start Server ────────────────────────────────────────────────────────────
96
804
  async function main() {
97
805
  const transport = new stdio_js_1.StdioServerTransport();
98
806
  await server.connect(transport);
99
807
  }
100
- main().catch(console.error);
808
+ main().catch((error) => {
809
+ console.error(error);
810
+ process.exitCode = 1;
811
+ });
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "scrape-do-mcp",
3
- "version": "0.1.6",
4
- "description": "MCP Server for Scrape.do - Web Scraping & Google Search with anti-bot bypass",
3
+ "version": "0.3.0",
4
+ "description": "MCP Server for Scrape.do - scraping, Google Search, Amazon, Async API, and Proxy Mode helpers",
5
5
  "main": "dist/index.js",
6
6
  "bin": {
7
7
  "scrape-do-mcp": "dist/index.js"
@@ -16,6 +16,9 @@
16
16
  "scraping",
17
17
  "web-scraper",
18
18
  "google-search",
19
+ "amazon-scraper",
20
+ "async-api",
21
+ "serp",
19
22
  "firecrawl-alternative"
20
23
  ],
21
24
  "license": "MIT",
@@ -25,6 +28,7 @@
25
28
  },
26
29
  "files": [
27
30
  "dist",
31
+ "LICENSE",
28
32
  "README.md",
29
33
  "README-ZH.md"
30
34
  ],