@deventerprisesoftware/scrapi-mcp 0.4.0 → 0.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +627 -78
- package/package.json +4 -4
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 DevEnterprise Software
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
CHANGED
|
@@ -5,101 +5,171 @@
|
|
|
5
5
|
[](https://opensource.org/licenses/MIT)
|
|
6
6
|
[](https://www.npmjs.com/package/@deventerprisesoftware/scrapi-mcp)
|
|
7
7
|
[](https://hub.docker.com/r/deventerprisesoftware/scrapi-mcp)
|
|
8
|
-
[](https://smithery.ai/servers/DevEnterpriseSoftware/scrapi-mcp)
|
|
9
|
+
[](https://glama.ai/mcp/servers/DevEnterpriseSoftware/scrapi-mcp)
|
|
9
10
|
|
|
10
|
-
MCP
|
|
11
|
+
ScrAPI MCP Server lets MCP-compatible clients scrape web pages through [ScrAPI](https://scrapi.tech).
|
|
11
12
|
|
|
12
|
-
ScrAPI is
|
|
13
|
+
ScrAPI is useful when a page needs a real browser session, CAPTCHA solving, residential proxy access, cookie banner handling, JavaScript rendering, geolocation-aware fetching, or pre-scrape browser actions such as clicking and scrolling.
|
|
13
14
|
|
|
14
|
-
|
|
15
|
-
<img width="380" height="200" src="https://glama.ai/mcp/servers/@DevEnterpriseSoftware/scrapi-mcp/badge" alt="ScrAPI Server MCP server" />
|
|
16
|
-
</a>
|
|
15
|
+
## Contents
|
|
17
16
|
|
|
18
|
-
|
|
17
|
+
- [Features](#features)
|
|
18
|
+
- [Available tools](#available-tools)
|
|
19
|
+
- [Prerequisites](#prerequisites)
|
|
20
|
+
- [API key](#api-key)
|
|
21
|
+
- [Quick start](#quick-start)
|
|
22
|
+
- [MCP client setup](#mcp-client-setup)
|
|
23
|
+
- [HTTP transport](#http-transport)
|
|
24
|
+
- [Cloud-hosted server](#cloud-hosted-server)
|
|
25
|
+
- [Usage examples](#usage-examples)
|
|
26
|
+
- [Browser commands](#browser-commands)
|
|
27
|
+
- [Troubleshooting](#troubleshooting)
|
|
28
|
+
- [Development](#development)
|
|
29
|
+
- [License](#license)
|
|
19
30
|
|
|
20
|
-
|
|
21
|
-
- Use a URL to scrape a website using the ScrAPI service and retrieve the result as HTML.
|
|
22
|
-
Use this for scraping website content that is difficult to access because of bot detection, captchas or even geolocation restrictions.
|
|
23
|
-
The result will be in HTML which is preferable if advanced parsing is required.
|
|
24
|
-
- Inputs:
|
|
25
|
-
- `url` (string, required): The URL to scrape
|
|
26
|
-
- `browserCommands` (string, optional): JSON array of browser commands to execute before scraping
|
|
27
|
-
- Returns: HTML content of the URL
|
|
31
|
+
## Features
|
|
28
32
|
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
- `browserCommands` (string, optional): JSON array of browser commands to execute before scraping
|
|
36
|
-
- Returns: Markdown content of the URL
|
|
33
|
+
- Scrape any valid `https://` or `http://` URL through ScrAPI.
|
|
34
|
+
- Return either raw HTML or readable Markdown.
|
|
35
|
+
- Run browser commands before scraping.
|
|
36
|
+
- Use stdio transport for desktop MCP clients.
|
|
37
|
+
- Use Streamable HTTP transport for remote MCP clients and local testing.
|
|
38
|
+
- Run with `npx`, Docker, Smithery, or from source.
|
|
37
39
|
|
|
38
|
-
##
|
|
40
|
+
## Available Tools
|
|
39
41
|
|
|
40
|
-
|
|
42
|
+
### `scrape_url_html`
|
|
41
43
|
|
|
42
|
-
|
|
43
|
-
- Filling out forms
|
|
44
|
-
- Selecting dropdown options
|
|
45
|
-
- Scrolling to load dynamic content
|
|
46
|
-
- Waiting for elements to appear
|
|
47
|
-
- Executing custom JavaScript
|
|
44
|
+
Scrapes a URL and returns the result as HTML.
|
|
48
45
|
|
|
49
|
-
|
|
46
|
+
Use this when you need the page structure, links, tables, embedded metadata, or custom downstream parsing.
|
|
50
47
|
|
|
51
|
-
|
|
48
|
+
Inputs:
|
|
52
49
|
|
|
53
|
-
|
|
|
54
|
-
|
|
|
55
|
-
|
|
|
56
|
-
|
|
|
57
|
-
| **Select** | `{"select": {"select[name='country']": "USA"}}` | Select from dropdown (by value or text) |
|
|
58
|
-
| **Scroll** | `{"scroll": 1000}` | Scroll down by pixels (negative values scroll up) |
|
|
59
|
-
| **Wait** | `{"wait": 5000}` | Wait for milliseconds (max 15000) |
|
|
60
|
-
| **WaitFor** | `{"waitfor": "#elementId"}` | Wait for element to appear in DOM |
|
|
61
|
-
| **JavaScript** | `{"javascript": "console.log('test')"}` | Execute custom JavaScript code |
|
|
50
|
+
| Name | Type | Required | Description |
|
|
51
|
+
| ---- | ---- | -------- | ----------- |
|
|
52
|
+
| `url` | string | Yes | The absolute URL to scrape. Must be a valid URL. |
|
|
53
|
+
| `browserCommands` | string | No | JSON array string of browser commands to execute before scraping. |
|
|
62
54
|
|
|
63
|
-
|
|
55
|
+
Returns:
|
|
64
56
|
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
57
|
+
- `text/html` content from the requested page.
|
|
58
|
+
- `isError: true` with the ScrAPI error body when the upstream request fails.
|
|
59
|
+
|
|
60
|
+
### `scrape_url_markdown`
|
|
61
|
+
|
|
62
|
+
Scrapes a URL and returns the result as Markdown.
|
|
63
|
+
|
|
64
|
+
Use this when the text content matters more than the HTML structure, for example article extraction, product copy, search result summaries, or LLM-friendly page analysis.
|
|
65
|
+
|
|
66
|
+
Inputs:
|
|
67
|
+
|
|
68
|
+
| Name | Type | Required | Description |
|
|
69
|
+
| ---- | ---- | -------- | ----------- |
|
|
70
|
+
| `url` | string | Yes | The absolute URL to scrape. Must be a valid URL. |
|
|
71
|
+
| `browserCommands` | string | No | JSON array string of browser commands to execute before scraping. |
|
|
72
|
+
|
|
73
|
+
Returns:
|
|
74
|
+
|
|
75
|
+
- `text/markdown` content from the requested page.
|
|
76
|
+
- `isError: true` with the ScrAPI error body when the upstream request fails.
|
|
77
|
+
|
|
78
|
+
## Prerequisites
|
|
79
|
+
|
|
80
|
+
Choose one of the following runtime options:
|
|
81
|
+
|
|
82
|
+
- Node.js 18 or newer for `npx` or local development.
|
|
83
|
+
- Docker for container-based usage.
|
|
84
|
+
- An MCP-compatible client such as Claude Desktop, MCP Inspector, or another client that supports stdio or Streamable HTTP MCP servers.
|
|
85
|
+
|
|
86
|
+
## API Key
|
|
87
|
+
|
|
88
|
+
Set `SCRAPI_API_KEY` to use your ScrAPI account:
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
export SCRAPI_API_KEY="your-scrapi-api-key"
|
|
74
92
|
```
|
|
75
93
|
|
|
76
|
-
|
|
94
|
+
PowerShell:
|
|
77
95
|
|
|
78
|
-
|
|
96
|
+
```powershell
|
|
97
|
+
$env:SCRAPI_API_KEY = "your-scrapi-api-key"
|
|
98
|
+
```
|
|
79
99
|
|
|
80
|
-
|
|
100
|
+
An API key is optional. Without one, ScrAPI currently allows limited free usage with lower concurrency and queueing priority.
|
|
81
101
|
|
|
82
|
-
##
|
|
102
|
+
## Quick Start
|
|
83
103
|
|
|
84
|
-
###
|
|
104
|
+
### Run with NPX
|
|
85
105
|
|
|
86
|
-
|
|
106
|
+
The default transport is stdio, which is the transport most desktop MCP clients use when they launch a local server process.
|
|
87
107
|
|
|
88
|
-
|
|
108
|
+
```bash
|
|
109
|
+
npx -y @deventerprisesoftware/scrapi-mcp
|
|
110
|
+
```
|
|
89
111
|
|
|
90
|
-
|
|
112
|
+
With an API key:
|
|
91
113
|
|
|
92
|
-
|
|
114
|
+
```bash
|
|
115
|
+
SCRAPI_API_KEY="your-scrapi-api-key" npx -y @deventerprisesoftware/scrapi-mcp
|
|
116
|
+
```
|
|
93
117
|
|
|
94
|
-
|
|
118
|
+
PowerShell:
|
|
119
|
+
|
|
120
|
+
```powershell
|
|
121
|
+
$env:SCRAPI_API_KEY = "your-scrapi-api-key"
|
|
122
|
+
npx -y @deventerprisesoftware/scrapi-mcp
|
|
123
|
+
```
|
|
95
124
|
|
|
96
|
-
|
|
125
|
+
### Run with Docker
|
|
97
126
|
|
|
98
|
-
|
|
127
|
+
The published Docker image starts in HTTP mode by default and listens on port `5000`.
|
|
128
|
+
|
|
129
|
+
```bash
|
|
130
|
+
docker run --rm -p 5000:5000 -e SCRAPI_API_KEY="your-scrapi-api-key" deventerprisesoftware/scrapi-mcp
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
MCP endpoint:
|
|
134
|
+
|
|
135
|
+
```text
|
|
136
|
+
http://localhost:5000/mcp
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
To run the container as a stdio server for a local MCP client:
|
|
140
|
+
|
|
141
|
+
```bash
|
|
142
|
+
docker run -i --rm -e TRANSPORT=stdio -e SCRAPI_API_KEY="your-scrapi-api-key" deventerprisesoftware/scrapi-mcp
|
|
143
|
+
```
|
|
99
144
|
|
|
100
|
-
|
|
145
|
+
## MCP Client Setup
|
|
101
146
|
|
|
102
|
-
|
|
147
|
+
Most local coding assistants use one of these two configuration shapes:
|
|
148
|
+
|
|
149
|
+
- Stdio: the client starts this package with `npx` or Docker and communicates over stdin/stdout.
|
|
150
|
+
- Streamable HTTP: you start this server yourself with `TRANSPORT=http`, then point the client at `http://localhost:5000/mcp` or your deployed URL.
|
|
151
|
+
|
|
152
|
+
When a client has a tool timeout setting, use a value close to `300000` milliseconds or `300` seconds. ScrAPI can take several minutes for pages that require CAPTCHA solving, browser rendering, or multiple browser commands.
|
|
153
|
+
|
|
154
|
+
### Claude Desktop with NPX
|
|
155
|
+
|
|
156
|
+
Add this to your `claude_desktop_config.json`:
|
|
157
|
+
|
|
158
|
+
```json
|
|
159
|
+
{
|
|
160
|
+
"mcpServers": {
|
|
161
|
+
"ScrAPI": {
|
|
162
|
+
"command": "npx",
|
|
163
|
+
"args": ["-y", "@deventerprisesoftware/scrapi-mcp"],
|
|
164
|
+
"env": {
|
|
165
|
+
"SCRAPI_API_KEY": "your-scrapi-api-key"
|
|
166
|
+
}
|
|
167
|
+
}
|
|
168
|
+
}
|
|
169
|
+
}
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
### Claude Desktop with Docker
|
|
103
173
|
|
|
104
174
|
```json
|
|
105
175
|
{
|
|
@@ -115,42 +185,521 @@ Add the following to your `claude_desktop_config.json`:
|
|
|
115
185
|
"deventerprisesoftware/scrapi-mcp"
|
|
116
186
|
],
|
|
117
187
|
"env": {
|
|
118
|
-
"SCRAPI_API_KEY": "
|
|
188
|
+
"SCRAPI_API_KEY": "your-scrapi-api-key"
|
|
119
189
|
}
|
|
120
190
|
}
|
|
121
191
|
}
|
|
122
192
|
}
|
|
123
193
|
```
|
|
124
194
|
|
|
125
|
-
|
|
195
|
+
After changing the config, restart Claude Desktop. You should see the two ScrAPI tools available in the MCP tools list.
|
|
196
|
+
|
|
197
|
+

|
|
198
|
+
|
|
199
|
+
### Cursor
|
|
200
|
+
|
|
201
|
+
Cursor supports project configuration at `.cursor/mcp.json` and global configuration at `~/.cursor/mcp.json`. See the [Cursor MCP documentation](https://docs.cursor.com/context/model-context-protocol).
|
|
202
|
+
|
|
203
|
+
Stdio configuration:
|
|
126
204
|
|
|
127
205
|
```json
|
|
128
206
|
{
|
|
129
207
|
"mcpServers": {
|
|
130
|
-
"
|
|
208
|
+
"scrapi": {
|
|
131
209
|
"command": "npx",
|
|
132
|
-
"args": [
|
|
133
|
-
"-y",
|
|
134
|
-
"@deventerprisesoftware/scrapi-mcp"
|
|
135
|
-
],
|
|
210
|
+
"args": ["-y", "@deventerprisesoftware/scrapi-mcp"],
|
|
136
211
|
"env": {
|
|
137
|
-
"SCRAPI_API_KEY": "
|
|
212
|
+
"SCRAPI_API_KEY": "${env:SCRAPI_API_KEY}"
|
|
138
213
|
}
|
|
139
214
|
}
|
|
140
215
|
}
|
|
141
216
|
}
|
|
142
217
|
```
|
|
143
218
|
|
|
144
|
-
|
|
219
|
+
HTTP configuration:
|
|
220
|
+
|
|
221
|
+
```json
|
|
222
|
+
{
|
|
223
|
+
"mcpServers": {
|
|
224
|
+
"scrapi": {
|
|
225
|
+
"url": "http://localhost:5000/mcp"
|
|
226
|
+
}
|
|
227
|
+
}
|
|
228
|
+
}
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
For HTTP, start the server first:
|
|
232
|
+
|
|
233
|
+
```bash
|
|
234
|
+
TRANSPORT=http PORT=5000 SCRAPI_API_KEY="your-scrapi-api-key" npx -y @deventerprisesoftware/scrapi-mcp
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
### Windsurf
|
|
238
|
+
|
|
239
|
+
Windsurf Cascade stores MCP servers in `~/.codeium/windsurf/mcp_config.json`. You can also add servers from `Windsurf Settings` > `Cascade` > `MCP Servers`. See the [Windsurf MCP documentation](https://docs.windsurf.com/windsurf/cascade/mcp).
|
|
240
|
+
|
|
241
|
+
Stdio configuration:
|
|
145
242
|
|
|
146
|
-
|
|
243
|
+
```json
|
|
244
|
+
{
|
|
245
|
+
"mcpServers": {
|
|
246
|
+
"scrapi": {
|
|
247
|
+
"command": "npx",
|
|
248
|
+
"args": ["-y", "@deventerprisesoftware/scrapi-mcp"],
|
|
249
|
+
"env": {
|
|
250
|
+
"SCRAPI_API_KEY": "${env:SCRAPI_API_KEY}"
|
|
251
|
+
}
|
|
252
|
+
}
|
|
253
|
+
}
|
|
254
|
+
}
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
HTTP configuration:
|
|
258
|
+
|
|
259
|
+
```json
|
|
260
|
+
{
|
|
261
|
+
"mcpServers": {
|
|
262
|
+
"scrapi": {
|
|
263
|
+
"serverUrl": "http://localhost:5000/mcp"
|
|
264
|
+
}
|
|
265
|
+
}
|
|
266
|
+
}
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
Windsurf supports `serverUrl` or `url` for remote HTTP MCP servers. If your team uses enterprise MCP controls, the server ID in the admin whitelist must match the key name, for example `scrapi`.
|
|
270
|
+
|
|
271
|
+
### Kilo Code
|
|
272
|
+
|
|
273
|
+
Kilo Code stores MCP configuration in the main Kilo config file. Use `~/.config/kilo/kilo.jsonc` for global configuration, `kilo.jsonc` in the project root, or `.kilo/kilo.jsonc` for project-specific configuration. See the [Kilo Code MCP documentation](https://kilo.ai/docs/automate/mcp/using-in-kilo-code).
|
|
274
|
+
|
|
275
|
+
Local stdio configuration:
|
|
276
|
+
|
|
277
|
+
```jsonc
|
|
278
|
+
{
|
|
279
|
+
"mcp": {
|
|
280
|
+
"scrapi": {
|
|
281
|
+
"type": "local",
|
|
282
|
+
"command": ["npx", "-y", "@deventerprisesoftware/scrapi-mcp"],
|
|
283
|
+
"environment": {
|
|
284
|
+
"SCRAPI_API_KEY": "your-scrapi-api-key"
|
|
285
|
+
},
|
|
286
|
+
"enabled": true,
|
|
287
|
+
"timeout": 300000
|
|
288
|
+
}
|
|
289
|
+
}
|
|
290
|
+
}
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
Remote HTTP configuration:
|
|
294
|
+
|
|
295
|
+
```jsonc
|
|
296
|
+
{
|
|
297
|
+
"mcp": {
|
|
298
|
+
"scrapi": {
|
|
299
|
+
"type": "remote",
|
|
300
|
+
"url": "http://localhost:5000/mcp",
|
|
301
|
+
"enabled": true,
|
|
302
|
+
"timeout": 300000
|
|
303
|
+
}
|
|
304
|
+
}
|
|
305
|
+
}
|
|
306
|
+
```
|
|
307
|
+
|
|
308
|
+
On Windows, if `npx` is not found from the Kilo Code UI, use `cmd` as the command and pass `/c`, `npx`, `-y`, and `@deventerprisesoftware/scrapi-mcp` as arguments.
|
|
309
|
+
|
|
310
|
+
### Codex
|
|
311
|
+
|
|
312
|
+
Codex supports MCP servers in the CLI and IDE extension. Both use the same MCP configuration. By default, Codex stores it in `~/.codex/config.toml`; trusted projects can also use `.codex/config.toml`. See the [Codex MCP documentation](https://developers.openai.com/codex/mcp).
|
|
313
|
+
|
|
314
|
+
Add a stdio server with the Codex CLI:
|
|
315
|
+
|
|
316
|
+
```bash
|
|
317
|
+
codex mcp add scrapi --env SCRAPI_API_KEY="your-scrapi-api-key" -- npx -y @deventerprisesoftware/scrapi-mcp
|
|
318
|
+
codex mcp list
|
|
319
|
+
```
|
|
320
|
+
|
|
321
|
+
Equivalent `config.toml` stdio configuration:
|
|
322
|
+
|
|
323
|
+
```toml
|
|
324
|
+
[mcp_servers.scrapi]
|
|
325
|
+
command = "npx"
|
|
326
|
+
args = ["-y", "@deventerprisesoftware/scrapi-mcp"]
|
|
327
|
+
startup_timeout_sec = 20
|
|
328
|
+
tool_timeout_sec = 300
|
|
329
|
+
|
|
330
|
+
[mcp_servers.scrapi.env]
|
|
331
|
+
SCRAPI_API_KEY = "your-scrapi-api-key"
|
|
332
|
+
```
|
|
333
|
+
|
|
334
|
+
HTTP configuration:
|
|
335
|
+
|
|
336
|
+
```toml
|
|
337
|
+
[mcp_servers.scrapi]
|
|
338
|
+
url = "http://localhost:5000/mcp"
|
|
339
|
+
tool_timeout_sec = 300
|
|
340
|
+
```
|
|
341
|
+
|
|
342
|
+
In the Codex terminal UI, run `/mcp` to confirm the server is connected.
|
|
343
|
+
|
|
344
|
+
### VS Code
|
|
345
|
+
|
|
346
|
+
VS Code stores MCP configuration in `.vscode/mcp.json` for a workspace or in your user profile. The top-level key is `servers`, not `mcpServers`. See the [VS Code MCP configuration reference](https://code.visualstudio.com/docs/agents/reference/mcp-configuration).
|
|
347
|
+
|
|
348
|
+
Stdio configuration:
|
|
349
|
+
|
|
350
|
+
```json
|
|
351
|
+
{
|
|
352
|
+
"inputs": [
|
|
353
|
+
{
|
|
354
|
+
"type": "promptString",
|
|
355
|
+
"id": "scrapi-api-key",
|
|
356
|
+
"description": "ScrAPI API key",
|
|
357
|
+
"password": true
|
|
358
|
+
}
|
|
359
|
+
],
|
|
360
|
+
"servers": {
|
|
361
|
+
"scrapi": {
|
|
362
|
+
"type": "stdio",
|
|
363
|
+
"command": "npx",
|
|
364
|
+
"args": ["-y", "@deventerprisesoftware/scrapi-mcp"],
|
|
365
|
+
"env": {
|
|
366
|
+
"SCRAPI_API_KEY": "${input:scrapi-api-key}"
|
|
367
|
+
}
|
|
368
|
+
}
|
|
369
|
+
}
|
|
370
|
+
}
|
|
371
|
+
```
|
|
372
|
+
|
|
373
|
+
HTTP configuration:
|
|
374
|
+
|
|
375
|
+
```json
|
|
376
|
+
{
|
|
377
|
+
"servers": {
|
|
378
|
+
"scrapi": {
|
|
379
|
+
"type": "http",
|
|
380
|
+
"url": "http://localhost:5000/mcp"
|
|
381
|
+
}
|
|
382
|
+
}
|
|
383
|
+
}
|
|
384
|
+
```
|
|
385
|
+
|
|
386
|
+
Use the Command Palette commands `MCP: Add Server`, `MCP: List Servers`, and `MCP: Reset Cached Tools` to add, inspect, and refresh MCP servers.
|
|
387
|
+
|
|
388
|
+
### Claude Code
|
|
389
|
+
|
|
390
|
+
Claude Code supports MCP servers through the `claude mcp` CLI and the `/mcp` command inside Claude Code. See the [Claude Code MCP documentation](https://code.claude.com/docs/en/mcp).
|
|
391
|
+
|
|
392
|
+
Add a stdio server:
|
|
393
|
+
|
|
394
|
+
```bash
|
|
395
|
+
claude mcp add --transport stdio --env SCRAPI_API_KEY="your-scrapi-api-key" scrapi -- npx -y @deventerprisesoftware/scrapi-mcp
|
|
396
|
+
claude mcp list
|
|
397
|
+
```
|
|
398
|
+
|
|
399
|
+
Add an HTTP server:
|
|
400
|
+
|
|
401
|
+
```bash
|
|
402
|
+
claude mcp add --transport http scrapi http://localhost:5000/mcp
|
|
403
|
+
claude mcp list
|
|
404
|
+
```
|
|
147
405
|
|
|
148
|
-
|
|
406
|
+
To make the server available across all Claude Code projects, add `--scope user` before the server name:
|
|
407
|
+
|
|
408
|
+
```bash
|
|
409
|
+
claude mcp add --transport stdio --scope user --env SCRAPI_API_KEY="your-scrapi-api-key" scrapi -- npx -y @deventerprisesoftware/scrapi-mcp
|
|
410
|
+
```
|
|
411
|
+
|
|
412
|
+
Inside Claude Code, run `/mcp` to confirm the server is connected.
|
|
413
|
+
|
|
414
|
+
### Generic Stdio MCP Client
|
|
415
|
+
|
|
416
|
+
Use this shape for clients that accept a command, arguments, and environment variables:
|
|
417
|
+
|
|
418
|
+
```json
|
|
419
|
+
{
|
|
420
|
+
"name": "ScrAPI",
|
|
421
|
+
"command": "npx",
|
|
422
|
+
"args": ["-y", "@deventerprisesoftware/scrapi-mcp"],
|
|
423
|
+
"env": {
|
|
424
|
+
"SCRAPI_API_KEY": "your-scrapi-api-key"
|
|
425
|
+
}
|
|
426
|
+
}
|
|
427
|
+
```
|
|
428
|
+
|
|
429
|
+
## HTTP Transport
|
|
430
|
+
|
|
431
|
+
Set `TRANSPORT=http` to run the server over Streamable HTTP.
|
|
432
|
+
|
|
433
|
+
```bash
|
|
434
|
+
TRANSPORT=http PORT=5000 SCRAPI_API_KEY="your-scrapi-api-key" npx -y @deventerprisesoftware/scrapi-mcp
|
|
435
|
+
```
|
|
436
|
+
|
|
437
|
+
PowerShell:
|
|
438
|
+
|
|
439
|
+
```powershell
|
|
440
|
+
$env:TRANSPORT = "http"
|
|
441
|
+
$env:PORT = "5000"
|
|
442
|
+
$env:SCRAPI_API_KEY = "your-scrapi-api-key"
|
|
443
|
+
npx -y @deventerprisesoftware/scrapi-mcp
|
|
444
|
+
```
|
|
445
|
+
|
|
446
|
+
The MCP endpoint is:
|
|
447
|
+
|
|
448
|
+
```text
|
|
449
|
+
http://localhost:5000/mcp
|
|
450
|
+
```
|
|
451
|
+
|
|
452
|
+
Environment variables:
|
|
453
|
+
|
|
454
|
+
| Name | Default | Description |
|
|
455
|
+
| ---- | ------- | ----------- |
|
|
456
|
+
| `SCRAPI_API_KEY` | Limited default key | ScrAPI API key used when calling the ScrAPI scrape API. |
|
|
457
|
+
| `TRANSPORT` | `stdio` | Use `stdio` or `http`. |
|
|
458
|
+
| `PORT` | `5000` | Port used when `TRANSPORT=http`. |
|
|
459
|
+
|
|
460
|
+
### Test with MCP Inspector
|
|
461
|
+
|
|
462
|
+
Stdio mode:
|
|
463
|
+
|
|
464
|
+
```bash
|
|
465
|
+
npx @modelcontextprotocol/inspector npx -y @deventerprisesoftware/scrapi-mcp
|
|
466
|
+
```
|
|
467
|
+
|
|
468
|
+
HTTP mode:
|
|
469
|
+
|
|
470
|
+
```bash
|
|
471
|
+
TRANSPORT=http PORT=5000 npx -y @deventerprisesoftware/scrapi-mcp
|
|
472
|
+
```
|
|
473
|
+
|
|
474
|
+
Then open MCP Inspector and connect to:
|
|
475
|
+
|
|
476
|
+
```text
|
|
477
|
+
http://localhost:5000/mcp
|
|
478
|
+
```
|
|
479
|
+
|
|
480
|
+

|
|
481
|
+
|
|
482
|
+
## Cloud-Hosted Server
|
|
483
|
+
|
|
484
|
+
ScrAPI also provides hosted MCP endpoints:
|
|
485
|
+
|
|
486
|
+
```text
|
|
487
|
+
Streamable HTTP: https://api.scrapi.tech/mcp
|
|
488
|
+
SSE: https://api.scrapi.tech/mcp/sse
|
|
489
|
+
```
|
|
490
|
+
|
|
491
|
+
Cloud MCP servers are not yet supported by every MCP client. They are most useful for custom clients, MCP Inspector, or platforms that support remote MCP servers.
|
|
492
|
+
|
|
493
|
+
To authenticate with your ScrAPI API key, pass it as a query parameter or request header:
|
|
494
|
+
|
|
495
|
+
- Query parameter: `https://api.scrapi.tech/mcp?apiKey=<YOUR_API_KEY>`
|
|
496
|
+
- Request header: `X-API-KEY: <YOUR_API_KEY>`
|
|
497
|
+
|
|
498
|
+
## Usage Examples
|
|
499
|
+
|
|
500
|
+
The exact interaction depends on your MCP client. In most clients, you can either ask the model to use the ScrAPI tool or call the tool directly from a tool inspector.
|
|
501
|
+
|
|
502
|
+
### Scrape a Page as Markdown
|
|
503
|
+
|
|
504
|
+
Tool:
|
|
505
|
+
|
|
506
|
+
```text
|
|
507
|
+
scrape_url_markdown
|
|
508
|
+
```
|
|
509
|
+
|
|
510
|
+
Arguments:
|
|
511
|
+
|
|
512
|
+
```json
|
|
513
|
+
{
|
|
514
|
+
"url": "https://example.com"
|
|
515
|
+
}
|
|
516
|
+
```
|
|
517
|
+
|
|
518
|
+
Example prompt:
|
|
519
|
+
|
|
520
|
+
```text
|
|
521
|
+
Use ScrAPI to scrape https://example.com as Markdown and summarize the page.
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
### Scrape a Page as HTML
|
|
525
|
+
|
|
526
|
+
Tool:
|
|
527
|
+
|
|
528
|
+
```text
|
|
529
|
+
scrape_url_html
|
|
530
|
+
```
|
|
531
|
+
|
|
532
|
+
Arguments:
|
|
533
|
+
|
|
534
|
+
```json
|
|
535
|
+
{
|
|
536
|
+
"url": "https://example.com"
|
|
537
|
+
}
|
|
538
|
+
```
|
|
539
|
+
|
|
540
|
+
Example prompt:
|
|
541
|
+
|
|
542
|
+
```text
|
|
543
|
+
Use ScrAPI to scrape https://example.com as HTML and extract every link.
|
|
544
|
+
```
|
|
545
|
+
|
|
546
|
+
### Accept Cookies Before Scraping
|
|
547
|
+
|
|
548
|
+
The `browserCommands` value must be a string containing a JSON array.
|
|
549
|
+
|
|
550
|
+
```json
|
|
551
|
+
{
|
|
552
|
+
"url": "https://example.com",
|
|
553
|
+
"browserCommands": "[{\"click\":\"#accept-cookies\"},{\"wait\":1000}]"
|
|
554
|
+
}
|
|
555
|
+
```
|
|
556
|
+
|
|
557
|
+
### Search a Site Before Scraping Results
|
|
558
|
+
|
|
559
|
+
```json
|
|
560
|
+
{
|
|
561
|
+
"url": "https://example.com/search",
|
|
562
|
+
"browserCommands": "[{\"input\":{\"input[name='q']\":\"web scraping\"}},{\"click\":\"button[type='submit']\"},{\"waitfor\":\"#results\"}]"
|
|
563
|
+
}
|
|
564
|
+
```
|
|
565
|
+
|
|
566
|
+
### Load More Content
|
|
567
|
+
|
|
568
|
+
```json
|
|
569
|
+
{
|
|
570
|
+
"url": "https://example.com/products",
|
|
571
|
+
"browserCommands": "[{\"scroll\":1200},{\"wait\":1000},{\"click\":\"button.load-more\"},{\"waitfor\":\".product-card:nth-child(25)\"}]"
|
|
572
|
+
}
|
|
573
|
+
```
|
|
574
|
+
|
|
575
|
+
## Browser Commands
|
|
576
|
+
|
|
577
|
+
Both tools support optional browser commands that interact with the page before ScrAPI captures the final result.
|
|
578
|
+
|
|
579
|
+
Commands are provided as a JSON array string. They are executed with human-like behavior such as random mouse movement and variable typing speed.
|
|
580
|
+
|
|
581
|
+
| Command | Format | Description |
|
|
582
|
+
| ------- | ------ | ----------- |
|
|
583
|
+
| Click | `{"click": "#buttonId"}` | Click an element by CSS selector. |
|
|
584
|
+
| Input | `{"input": {"input[name='email']": "value"}}` | Fill an input field. |
|
|
585
|
+
| Select | `{"select": {"select[name='country']": "USA"}}` | Select an option by value or visible text. |
|
|
586
|
+
| Scroll | `{"scroll": 1000}` | Scroll down by pixels. Use a negative value to scroll up. |
|
|
587
|
+
| Wait | `{"wait": 5000}` | Wait for milliseconds. Maximum: `15000`. |
|
|
588
|
+
| WaitFor | `{"waitfor": "#elementId"}` | Wait for an element to appear in the DOM. |
|
|
589
|
+
| JavaScript | `{"javascript": "console.log('test')"}` | Execute custom JavaScript. |
|
|
590
|
+
|
|
591
|
+
Readable command array:
|
|
592
|
+
|
|
593
|
+
```json
|
|
594
|
+
[
|
|
595
|
+
{ "click": "#accept-cookies" },
|
|
596
|
+
{ "wait": 2000 },
|
|
597
|
+
{ "input": { "input[name='search']": "web scraping" } },
|
|
598
|
+
{ "click": "button[type='submit']" },
|
|
599
|
+
{ "waitfor": "#results" },
|
|
600
|
+
{ "scroll": 500 }
|
|
601
|
+
]
|
|
602
|
+
```
|
|
603
|
+
|
|
604
|
+
Escaped as an MCP tool argument:
|
|
605
|
+
|
|
606
|
+
```json
|
|
607
|
+
{
|
|
608
|
+
"url": "https://example.com",
|
|
609
|
+
"browserCommands": "[{\"click\":\"#accept-cookies\"},{\"wait\":2000},{\"input\":{\"input[name='search']\":\"web scraping\"}},{\"click\":\"button[type='submit']\"},{\"waitfor\":\"#results\"},{\"scroll\":500}]"
|
|
610
|
+
}
|
|
611
|
+
```
|
|
612
|
+
|
|
613
|
+
Need help finding CSS selectors? Try the [Rayrun browser extension](https://chromewebstore.google.com/detail/rayrun/olljocejdgeipcaompahmnfebhkfmnma) to select elements and generate selectors.
|
|
614
|
+
|
|
615
|
+
For more details, see the [Browser Commands documentation](https://scrapi.tech/docs/api_details/v1_scrape/browser_commands).
|
|
616
|
+
|
|
617
|
+
## Troubleshooting
|
|
618
|
+
|
|
619
|
+
### The MCP client cannot find the server
|
|
620
|
+
|
|
621
|
+
- Confirm Node.js 18 or newer is installed if using `npx`.
|
|
622
|
+
- Confirm Docker Desktop is running if using Docker.
|
|
623
|
+
- Restart the MCP client after editing its config file.
|
|
624
|
+
- Check that the configured command works in a terminal.
|
|
625
|
+
|
|
626
|
+
### The tools appear, but scraping fails
|
|
627
|
+
|
|
628
|
+
- Confirm `SCRAPI_API_KEY` is set correctly.
|
|
629
|
+
- Try the same URL without `browserCommands`.
|
|
630
|
+
- Make sure `browserCommands` is a JSON array string, not a raw JSON array.
|
|
631
|
+
- Use `scrape_url_html` if Markdown extraction omits structure you need.
|
|
632
|
+
- Long-running pages, CAPTCHA flows, and heavy JavaScript pages can take several minutes.
|
|
633
|
+
|
|
634
|
+
### Browser commands are ignored
|
|
635
|
+
|
|
636
|
+
The server only sends browser commands when `browserCommands` parses as a JSON array. This is valid:
|
|
637
|
+
|
|
638
|
+
```json
|
|
639
|
+
{
|
|
640
|
+
"browserCommands": "[{\"click\":\"#accept-cookies\"}]"
|
|
641
|
+
}
|
|
642
|
+
```
|
|
643
|
+
|
|
644
|
+
This is not valid for this MCP tool schema because it is an object array, not a string:
|
|
645
|
+
|
|
646
|
+
```json
|
|
647
|
+
{
|
|
648
|
+
"browserCommands": [{ "click": "#accept-cookies" }]
|
|
649
|
+
}
|
|
650
|
+
```
|
|
651
|
+
|
|
652
|
+
### HTTP endpoint does not respond
|
|
653
|
+
|
|
654
|
+
- Confirm the server was started with `TRANSPORT=http`.
|
|
655
|
+
- Confirm the client connects to `/mcp`, not `/`.
|
|
656
|
+
- Confirm the port matches `PORT`.
|
|
657
|
+
|
|
658
|
+
## Development
|
|
659
|
+
|
|
660
|
+
Install dependencies:
|
|
661
|
+
|
|
662
|
+
```bash
|
|
663
|
+
npm install
|
|
664
|
+
```
|
|
665
|
+
|
|
666
|
+
Run tests:
|
|
667
|
+
|
|
668
|
+
```bash
|
|
669
|
+
npm test
|
|
670
|
+
```
|
|
671
|
+
|
|
672
|
+
Build:
|
|
673
|
+
|
|
674
|
+
```bash
|
|
675
|
+
npm run build
|
|
676
|
+
```
|
|
677
|
+
|
|
678
|
+
Run from source in stdio mode:
|
|
679
|
+
|
|
680
|
+
```bash
|
|
681
|
+
npm run build
|
|
682
|
+
node dist/index.js
|
|
683
|
+
```
|
|
684
|
+
|
|
685
|
+
Run from source in HTTP mode:
|
|
686
|
+
|
|
687
|
+
```bash
|
|
688
|
+
TRANSPORT=http PORT=5000 node dist/index.js
|
|
689
|
+
```
|
|
690
|
+
|
|
691
|
+
Build the Docker image:
|
|
149
692
|
|
|
150
693
|
```bash
|
|
151
694
|
docker build -t deventerprisesoftware/scrapi-mcp -f Dockerfile .
|
|
152
695
|
```
|
|
153
696
|
|
|
697
|
+
Or use the package script:
|
|
698
|
+
|
|
699
|
+
```bash
|
|
700
|
+
npm run docker:build
|
|
701
|
+
```
|
|
702
|
+
|
|
154
703
|
## License
|
|
155
704
|
|
|
156
|
-
This MCP server is licensed under the MIT License.
|
|
705
|
+
This MCP server is licensed under the MIT License. You are free to use, modify, and distribute the software subject to the terms of the MIT License. See [LICENSE](LICENSE) for details.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@deventerprisesoftware/scrapi-mcp",
|
|
3
|
-
"version": "0.4.
|
|
3
|
+
"version": "0.4.1",
|
|
4
4
|
"description": "MCP server for using ScrAPI to scrape web pages.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"mcp",
|
|
@@ -56,12 +56,12 @@
|
|
|
56
56
|
"@types/cors": "^2.8.19",
|
|
57
57
|
"@types/express": "^5.0.6",
|
|
58
58
|
"@types/node": "^25.9.1",
|
|
59
|
-
"eslint": "^10.4.
|
|
59
|
+
"eslint": "^10.4.1",
|
|
60
60
|
"eslint-config-prettier": "^10.1.8",
|
|
61
61
|
"prettier": "^3.8.3",
|
|
62
62
|
"shx": "^0.4.0",
|
|
63
63
|
"typescript": "^6.0.3",
|
|
64
|
-
"typescript-eslint": "^8.
|
|
65
|
-
"vitest": "^4.1.
|
|
64
|
+
"typescript-eslint": "^8.60.1",
|
|
65
|
+
"vitest": "^4.1.8"
|
|
66
66
|
}
|
|
67
67
|
}
|