@deventerprisesoftware/scrapi-mcp 0.4.0 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +627 -78
  3. package/package.json +4 -4
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 DevEnterprise Software
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md CHANGED
@@ -5,101 +5,171 @@
5
5
  [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
6
6
  [![NPM Downloads](https://img.shields.io/npm/dm/@deventerprisesoftware/scrapi-mcp)](https://www.npmjs.com/package/@deventerprisesoftware/scrapi-mcp)
7
7
  [![Docker Pulls](https://img.shields.io/docker/pulls/deventerprisesoftware/scrapi-mcp)](https://hub.docker.com/r/deventerprisesoftware/scrapi-mcp)
8
- [![smithery badge](https://smithery.ai/badge/@DevEnterpriseSoftware/scrapi-mcp)](https://smithery.ai/server/@DevEnterpriseSoftware/scrapi-mcp)
8
+ [![smithery badge](https://smithery.ai/badge/DevEnterpriseSoftware/scrapi-mcp)](https://smithery.ai/servers/DevEnterpriseSoftware/scrapi-mcp)
9
+ [![scrapi-mcp MCP server](https://glama.ai/mcp/servers/DevEnterpriseSoftware/scrapi-mcp/badges/score.svg)](https://glama.ai/mcp/servers/DevEnterpriseSoftware/scrapi-mcp)
9
10
 
10
- MCP server for using [ScrAPI](https://scrapi.tech) to scrape web pages.
11
+ ScrAPI MCP Server lets MCP-compatible clients scrape web pages through [ScrAPI](https://scrapi.tech).
11
12
 
12
- ScrAPI is your ultimate web scraping solution, offering powerful, reliable, and easy-to-use features to extract data from any website effortlessly.
13
+ ScrAPI is useful when a page needs a real browser session, CAPTCHA solving, residential proxy access, cookie banner handling, JavaScript rendering, geolocation-aware fetching, or pre-scrape browser actions such as clicking and scrolling.
13
14
 
14
- <a href="https://glama.ai/mcp/servers/@DevEnterpriseSoftware/scrapi-mcp">
15
- <img width="380" height="200" src="https://glama.ai/mcp/servers/@DevEnterpriseSoftware/scrapi-mcp/badge" alt="ScrAPI Server MCP server" />
16
- </a>
15
+ ## Contents
17
16
 
18
- ## Tools
17
+ - [Features](#features)
18
+ - [Available tools](#available-tools)
19
+ - [Prerequisites](#prerequisites)
20
+ - [API key](#api-key)
21
+ - [Quick start](#quick-start)
22
+ - [MCP client setup](#mcp-client-setup)
23
+ - [HTTP transport](#http-transport)
24
+ - [Cloud-hosted server](#cloud-hosted-server)
25
+ - [Usage examples](#usage-examples)
26
+ - [Browser commands](#browser-commands)
27
+ - [Troubleshooting](#troubleshooting)
28
+ - [Development](#development)
29
+ - [License](#license)
19
30
 
20
- 1. `scrape_url_html`
21
- - Use a URL to scrape a website using the ScrAPI service and retrieve the result as HTML.
22
- Use this for scraping website content that is difficult to access because of bot detection, captchas or even geolocation restrictions.
23
- The result will be in HTML which is preferable if advanced parsing is required.
24
- - Inputs:
25
- - `url` (string, required): The URL to scrape
26
- - `browserCommands` (string, optional): JSON array of browser commands to execute before scraping
27
- - Returns: HTML content of the URL
31
+ ## Features
28
32
 
29
- 2. `scrape_url_markdown`
30
- - Use a URL to scrape a website using the ScrAPI service and retrieve the result as Markdown.
31
- Use this for scraping website content that is difficult to access because of bot detection, captchas or even geolocation restrictions.
32
- The result will be in Markdown which is preferable if the text content of the webpage is important and not the structural information of the page.
33
- - Inputs:
34
- - `url` (string, required): The URL to scrape
35
- - `browserCommands` (string, optional): JSON array of browser commands to execute before scraping
36
- - Returns: Markdown content of the URL
33
+ - Scrape any valid `https://` or `http://` URL through ScrAPI.
34
+ - Return either raw HTML or readable Markdown.
35
+ - Run browser commands before scraping.
36
+ - Use stdio transport for desktop MCP clients.
37
+ - Use Streamable HTTP transport for remote MCP clients and local testing.
38
+ - Run with `npx`, Docker, Smithery, or from source.
37
39
 
38
- ## Browser Commands
40
+ ## Available Tools
39
41
 
40
- Both tools support optional browser commands that allow you to interact with the page before scraping. This is useful for:
42
+ ### `scrape_url_html`
41
43
 
42
- - Clicking buttons (e.g., "Accept Cookies", "Load More")
43
- - Filling out forms
44
- - Selecting dropdown options
45
- - Scrolling to load dynamic content
46
- - Waiting for elements to appear
47
- - Executing custom JavaScript
44
+ Scrapes a URL and returns the result as HTML.
48
45
 
49
- ### Available Commands
46
+ Use this when you need the page structure, links, tables, embedded metadata, or custom downstream parsing.
50
47
 
51
- Commands are provided as a JSON array string. All commands are executed with human-like behavior (random mouse movements, variable typing speed, etc.):
48
+ Inputs:
52
49
 
53
- | Command | Format | Description |
54
- | -------------- | ----------------------------------------------- | ------------------------------------------------- |
55
- | **Click** | `{"click": "#buttonId"}` | Click an element using CSS selector |
56
- | **Input** | `{"input": {"input[name='email']": "value"}}` | Fill an input field |
57
- | **Select** | `{"select": {"select[name='country']": "USA"}}` | Select from dropdown (by value or text) |
58
- | **Scroll** | `{"scroll": 1000}` | Scroll down by pixels (negative values scroll up) |
59
- | **Wait** | `{"wait": 5000}` | Wait for milliseconds (max 15000) |
60
- | **WaitFor** | `{"waitfor": "#elementId"}` | Wait for element to appear in DOM |
61
- | **JavaScript** | `{"javascript": "console.log('test')"}` | Execute custom JavaScript code |
50
+ | Name | Type | Required | Description |
51
+ | ---- | ---- | -------- | ----------- |
52
+ | `url` | string | Yes | The absolute URL to scrape. Must be a valid URL. |
53
+ | `browserCommands` | string | No | JSON array string of browser commands to execute before scraping. |
62
54
 
63
- ### Example Usage
55
+ Returns:
64
56
 
65
- ```json
66
- [
67
- { "click": "#accept-cookies" },
68
- { "wait": 2000 },
69
- { "input": { "input[name='search']": "web scraping" } },
70
- { "click": "button[type='submit']" },
71
- { "waitfor": "#results" },
72
- { "scroll": 500 }
73
- ]
57
+ - `text/html` content from the requested page.
58
+ - `isError: true` with the ScrAPI error body when the upstream request fails.
59
+
60
+ ### `scrape_url_markdown`
61
+
62
+ Scrapes a URL and returns the result as Markdown.
63
+
64
+ Use this when the text content matters more than the HTML structure, for example article extraction, product copy, search result summaries, or LLM-friendly page analysis.
65
+
66
+ Inputs:
67
+
68
+ | Name | Type | Required | Description |
69
+ | ---- | ---- | -------- | ----------- |
70
+ | `url` | string | Yes | The absolute URL to scrape. Must be a valid URL. |
71
+ | `browserCommands` | string | No | JSON array string of browser commands to execute before scraping. |
72
+
73
+ Returns:
74
+
75
+ - `text/markdown` content from the requested page.
76
+ - `isError: true` with the ScrAPI error body when the upstream request fails.
77
+
78
+ ## Prerequisites
79
+
80
+ Choose one of the following runtime options:
81
+
82
+ - Node.js 18 or newer for `npx` or local development.
83
+ - Docker for container-based usage.
84
+ - An MCP-compatible client such as Claude Desktop, MCP Inspector, or another client that supports stdio or Streamable HTTP MCP servers.
85
+
86
+ ## API Key
87
+
88
+ Set `SCRAPI_API_KEY` to use your ScrAPI account:
89
+
90
+ ```bash
91
+ export SCRAPI_API_KEY="your-scrapi-api-key"
74
92
  ```
75
93
 
76
- ### Finding CSS Selectors
94
+ PowerShell:
77
95
 
78
- Need help finding CSS selectors? Try the [Rayrun browser extension](https://chromewebstore.google.com/detail/rayrun/olljocejdgeipcaompahmnfebhkfmnma) to easily select elements and generate selectors.
96
+ ```powershell
97
+ $env:SCRAPI_API_KEY = "your-scrapi-api-key"
98
+ ```
79
99
 
80
- For more details, see the [Browser Commands documentation](https://scrapi.tech/docs/api_details/v1_scrape/browser_commands).
100
+ An API key is optional. Without one, ScrAPI currently allows limited free usage with lower concurrency and queueing priority.
81
101
 
82
- ## Setup
102
+ ## Quick Start
83
103
 
84
- ### API Key (optional)
104
+ ### Run with NPX
85
105
 
86
- Optionally get an API key from the [ScrAPI website](https://scrapi.tech).
106
+ The default transport is stdio, which is the transport most desktop MCP clients use when they launch a local server process.
87
107
 
88
- Without an API key you will be limited to one concurrent call and twenty free calls per day with minimal queuing capabilities.
108
+ ```bash
109
+ npx -y @deventerprisesoftware/scrapi-mcp
110
+ ```
89
111
 
90
- ### Cloud Server
112
+ With an API key:
91
113
 
92
- The ScrAPI MCP Server is also available in the cloud over SSE at https://api.scrapi.tech/mcp/sse and streamable HTTP at https://api.scrapi.tech/mcp
114
+ ```bash
115
+ SCRAPI_API_KEY="your-scrapi-api-key" npx -y @deventerprisesoftware/scrapi-mcp
116
+ ```
93
117
 
94
- Cloud MCP servers are not widely supported yet but you can access this directly from your own custom clients or use [MCP Inspector](https://github.com/modelcontextprotocol/inspector) to test it. There is currently no facility to pass through your API key when connecting to the cloud MCP server.
118
+ PowerShell:
119
+
120
+ ```powershell
121
+ $env:SCRAPI_API_KEY = "your-scrapi-api-key"
122
+ npx -y @deventerprisesoftware/scrapi-mcp
123
+ ```
95
124
 
96
- ![MCP-Inspector](https://raw.githubusercontent.com/DevEnterpriseSoftware/scrapi-mcp/master/img/mcp-inspector.jpg)
125
+ ### Run with Docker
97
126
 
98
- ### Usage with Claude Desktop
127
+ The published Docker image starts in HTTP mode by default and listens on port `5000`.
128
+
129
+ ```bash
130
+ docker run --rm -p 5000:5000 -e SCRAPI_API_KEY="your-scrapi-api-key" deventerprisesoftware/scrapi-mcp
131
+ ```
132
+
133
+ MCP endpoint:
134
+
135
+ ```text
136
+ http://localhost:5000/mcp
137
+ ```
138
+
139
+ To run the container as a stdio server for a local MCP client:
140
+
141
+ ```bash
142
+ docker run -i --rm -e TRANSPORT=stdio -e SCRAPI_API_KEY="your-scrapi-api-key" deventerprisesoftware/scrapi-mcp
143
+ ```
99
144
 
100
- Add the following to your `claude_desktop_config.json`:
145
+ ## MCP Client Setup
101
146
 
102
- #### Docker
147
+ Most local coding assistants use one of these two configuration shapes:
148
+
149
+ - Stdio: the client starts this package with `npx` or Docker and communicates over stdin/stdout.
150
+ - Streamable HTTP: you start this server yourself with `TRANSPORT=http`, then point the client at `http://localhost:5000/mcp` or your deployed URL.
151
+
152
+ When a client has a tool timeout setting, use a value close to `300000` milliseconds or `300` seconds. ScrAPI can take several minutes for pages that require CAPTCHA solving, browser rendering, or multiple browser commands.
153
+
154
+ ### Claude Desktop with NPX
155
+
156
+ Add this to your `claude_desktop_config.json`:
157
+
158
+ ```json
159
+ {
160
+ "mcpServers": {
161
+ "ScrAPI": {
162
+ "command": "npx",
163
+ "args": ["-y", "@deventerprisesoftware/scrapi-mcp"],
164
+ "env": {
165
+ "SCRAPI_API_KEY": "your-scrapi-api-key"
166
+ }
167
+ }
168
+ }
169
+ }
170
+ ```
171
+
172
+ ### Claude Desktop with Docker
103
173
 
104
174
  ```json
105
175
  {
@@ -115,42 +185,521 @@ Add the following to your `claude_desktop_config.json`:
115
185
  "deventerprisesoftware/scrapi-mcp"
116
186
  ],
117
187
  "env": {
118
- "SCRAPI_API_KEY": "<YOUR_API_KEY>"
188
+ "SCRAPI_API_KEY": "your-scrapi-api-key"
119
189
  }
120
190
  }
121
191
  }
122
192
  }
123
193
  ```
124
194
 
125
- #### NPX
195
+ After changing the config, restart Claude Desktop. You should see the two ScrAPI tools available in the MCP tools list.
196
+
197
+ ![Claude Desktop](https://raw.githubusercontent.com/DevEnterpriseSoftware/scrapi-mcp/master/img/claude-desktop.jpg)
198
+
199
+ ### Cursor
200
+
201
+ Cursor supports project configuration at `.cursor/mcp.json` and global configuration at `~/.cursor/mcp.json`. See the [Cursor MCP documentation](https://docs.cursor.com/context/model-context-protocol).
202
+
203
+ Stdio configuration:
126
204
 
127
205
  ```json
128
206
  {
129
207
  "mcpServers": {
130
- "ScrAPI": {
208
+ "scrapi": {
131
209
  "command": "npx",
132
- "args": [
133
- "-y",
134
- "@deventerprisesoftware/scrapi-mcp"
135
- ],
210
+ "args": ["-y", "@deventerprisesoftware/scrapi-mcp"],
136
211
  "env": {
137
- "SCRAPI_API_KEY": "<YOUR_API_KEY>"
212
+ "SCRAPI_API_KEY": "${env:SCRAPI_API_KEY}"
138
213
  }
139
214
  }
140
215
  }
141
216
  }
142
217
  ```
143
218
 
144
- ![Claude-Desktop](https://raw.githubusercontent.com/DevEnterpriseSoftware/scrapi-mcp/master/img/claude-desktop.jpg)
219
+ HTTP configuration:
220
+
221
+ ```json
222
+ {
223
+ "mcpServers": {
224
+ "scrapi": {
225
+ "url": "http://localhost:5000/mcp"
226
+ }
227
+ }
228
+ }
229
+ ```
230
+
231
+ For HTTP, start the server first:
232
+
233
+ ```bash
234
+ TRANSPORT=http PORT=5000 SCRAPI_API_KEY="your-scrapi-api-key" npx -y @deventerprisesoftware/scrapi-mcp
235
+ ```
236
+
237
+ ### Windsurf
238
+
239
+ Windsurf Cascade stores MCP servers in `~/.codeium/windsurf/mcp_config.json`. You can also add servers from `Windsurf Settings` > `Cascade` > `MCP Servers`. See the [Windsurf MCP documentation](https://docs.windsurf.com/windsurf/cascade/mcp).
240
+
241
+ Stdio configuration:
145
242
 
146
- ## Build
243
+ ```json
244
+ {
245
+ "mcpServers": {
246
+ "scrapi": {
247
+ "command": "npx",
248
+ "args": ["-y", "@deventerprisesoftware/scrapi-mcp"],
249
+ "env": {
250
+ "SCRAPI_API_KEY": "${env:SCRAPI_API_KEY}"
251
+ }
252
+ }
253
+ }
254
+ }
255
+ ```
256
+
257
+ HTTP configuration:
258
+
259
+ ```json
260
+ {
261
+ "mcpServers": {
262
+ "scrapi": {
263
+ "serverUrl": "http://localhost:5000/mcp"
264
+ }
265
+ }
266
+ }
267
+ ```
268
+
269
+ Windsurf supports `serverUrl` or `url` for remote HTTP MCP servers. If your team uses enterprise MCP controls, the server ID in the admin whitelist must match the key name, for example `scrapi`.
270
+
271
+ ### Kilo Code
272
+
273
+ Kilo Code stores MCP configuration in the main Kilo config file. Use `~/.config/kilo/kilo.jsonc` for global configuration, `kilo.jsonc` in the project root, or `.kilo/kilo.jsonc` for project-specific configuration. See the [Kilo Code MCP documentation](https://kilo.ai/docs/automate/mcp/using-in-kilo-code).
274
+
275
+ Local stdio configuration:
276
+
277
+ ```jsonc
278
+ {
279
+ "mcp": {
280
+ "scrapi": {
281
+ "type": "local",
282
+ "command": ["npx", "-y", "@deventerprisesoftware/scrapi-mcp"],
283
+ "environment": {
284
+ "SCRAPI_API_KEY": "your-scrapi-api-key"
285
+ },
286
+ "enabled": true,
287
+ "timeout": 300000
288
+ }
289
+ }
290
+ }
291
+ ```
292
+
293
+ Remote HTTP configuration:
294
+
295
+ ```jsonc
296
+ {
297
+ "mcp": {
298
+ "scrapi": {
299
+ "type": "remote",
300
+ "url": "http://localhost:5000/mcp",
301
+ "enabled": true,
302
+ "timeout": 300000
303
+ }
304
+ }
305
+ }
306
+ ```
307
+
308
+ On Windows, if `npx` is not found from the Kilo Code UI, use `cmd` as the command and pass `/c`, `npx`, `-y`, and `@deventerprisesoftware/scrapi-mcp` as arguments.
309
+
310
+ ### Codex
311
+
312
+ Codex supports MCP servers in the CLI and IDE extension. Both use the same MCP configuration. By default, Codex stores it in `~/.codex/config.toml`; trusted projects can also use `.codex/config.toml`. See the [Codex MCP documentation](https://developers.openai.com/codex/mcp).
313
+
314
+ Add a stdio server with the Codex CLI:
315
+
316
+ ```bash
317
+ codex mcp add scrapi --env SCRAPI_API_KEY="your-scrapi-api-key" -- npx -y @deventerprisesoftware/scrapi-mcp
318
+ codex mcp list
319
+ ```
320
+
321
+ Equivalent `config.toml` stdio configuration:
322
+
323
+ ```toml
324
+ [mcp_servers.scrapi]
325
+ command = "npx"
326
+ args = ["-y", "@deventerprisesoftware/scrapi-mcp"]
327
+ startup_timeout_sec = 20
328
+ tool_timeout_sec = 300
329
+
330
+ [mcp_servers.scrapi.env]
331
+ SCRAPI_API_KEY = "your-scrapi-api-key"
332
+ ```
333
+
334
+ HTTP configuration:
335
+
336
+ ```toml
337
+ [mcp_servers.scrapi]
338
+ url = "http://localhost:5000/mcp"
339
+ tool_timeout_sec = 300
340
+ ```
341
+
342
+ In the Codex terminal UI, run `/mcp` to confirm the server is connected.
343
+
344
+ ### VS Code
345
+
346
+ VS Code stores MCP configuration in `.vscode/mcp.json` for a workspace or in your user profile. The top-level key is `servers`, not `mcpServers`. See the [VS Code MCP configuration reference](https://code.visualstudio.com/docs/agents/reference/mcp-configuration).
347
+
348
+ Stdio configuration:
349
+
350
+ ```json
351
+ {
352
+ "inputs": [
353
+ {
354
+ "type": "promptString",
355
+ "id": "scrapi-api-key",
356
+ "description": "ScrAPI API key",
357
+ "password": true
358
+ }
359
+ ],
360
+ "servers": {
361
+ "scrapi": {
362
+ "type": "stdio",
363
+ "command": "npx",
364
+ "args": ["-y", "@deventerprisesoftware/scrapi-mcp"],
365
+ "env": {
366
+ "SCRAPI_API_KEY": "${input:scrapi-api-key}"
367
+ }
368
+ }
369
+ }
370
+ }
371
+ ```
372
+
373
+ HTTP configuration:
374
+
375
+ ```json
376
+ {
377
+ "servers": {
378
+ "scrapi": {
379
+ "type": "http",
380
+ "url": "http://localhost:5000/mcp"
381
+ }
382
+ }
383
+ }
384
+ ```
385
+
386
+ Use the Command Palette commands `MCP: Add Server`, `MCP: List Servers`, and `MCP: Reset Cached Tools` to add, inspect, and refresh MCP servers.
387
+
388
+ ### Claude Code
389
+
390
+ Claude Code supports MCP servers through the `claude mcp` CLI and the `/mcp` command inside Claude Code. See the [Claude Code MCP documentation](https://code.claude.com/docs/en/mcp).
391
+
392
+ Add a stdio server:
393
+
394
+ ```bash
395
+ claude mcp add --transport stdio --env SCRAPI_API_KEY="your-scrapi-api-key" scrapi -- npx -y @deventerprisesoftware/scrapi-mcp
396
+ claude mcp list
397
+ ```
398
+
399
+ Add an HTTP server:
400
+
401
+ ```bash
402
+ claude mcp add --transport http scrapi http://localhost:5000/mcp
403
+ claude mcp list
404
+ ```
147
405
 
148
- Docker build:
406
+ To make the server available across all Claude Code projects, add `--scope user` before the server name:
407
+
408
+ ```bash
409
+ claude mcp add --transport stdio --scope user --env SCRAPI_API_KEY="your-scrapi-api-key" scrapi -- npx -y @deventerprisesoftware/scrapi-mcp
410
+ ```
411
+
412
+ Inside Claude Code, run `/mcp` to confirm the server is connected.
413
+
414
+ ### Generic Stdio MCP Client
415
+
416
+ Use this shape for clients that accept a command, arguments, and environment variables:
417
+
418
+ ```json
419
+ {
420
+ "name": "ScrAPI",
421
+ "command": "npx",
422
+ "args": ["-y", "@deventerprisesoftware/scrapi-mcp"],
423
+ "env": {
424
+ "SCRAPI_API_KEY": "your-scrapi-api-key"
425
+ }
426
+ }
427
+ ```
428
+
429
+ ## HTTP Transport
430
+
431
+ Set `TRANSPORT=http` to run the server over Streamable HTTP.
432
+
433
+ ```bash
434
+ TRANSPORT=http PORT=5000 SCRAPI_API_KEY="your-scrapi-api-key" npx -y @deventerprisesoftware/scrapi-mcp
435
+ ```
436
+
437
+ PowerShell:
438
+
439
+ ```powershell
440
+ $env:TRANSPORT = "http"
441
+ $env:PORT = "5000"
442
+ $env:SCRAPI_API_KEY = "your-scrapi-api-key"
443
+ npx -y @deventerprisesoftware/scrapi-mcp
444
+ ```
445
+
446
+ The MCP endpoint is:
447
+
448
+ ```text
449
+ http://localhost:5000/mcp
450
+ ```
451
+
452
+ Environment variables:
453
+
454
+ | Name | Default | Description |
455
+ | ---- | ------- | ----------- |
456
+ | `SCRAPI_API_KEY` | Limited default key | ScrAPI API key used when calling the ScrAPI scrape API. |
457
+ | `TRANSPORT` | `stdio` | Use `stdio` or `http`. |
458
+ | `PORT` | `5000` | Port used when `TRANSPORT=http`. |
459
+
460
+ ### Test with MCP Inspector
461
+
462
+ Stdio mode:
463
+
464
+ ```bash
465
+ npx @modelcontextprotocol/inspector npx -y @deventerprisesoftware/scrapi-mcp
466
+ ```
467
+
468
+ HTTP mode:
469
+
470
+ ```bash
471
+ TRANSPORT=http PORT=5000 npx -y @deventerprisesoftware/scrapi-mcp
472
+ ```
473
+
474
+ Then open MCP Inspector and connect to:
475
+
476
+ ```text
477
+ http://localhost:5000/mcp
478
+ ```
479
+
480
+ ![MCP Inspector](https://raw.githubusercontent.com/DevEnterpriseSoftware/scrapi-mcp/master/img/mcp-inspector.jpg)
481
+
482
+ ## Cloud-Hosted Server
483
+
484
+ ScrAPI also provides hosted MCP endpoints:
485
+
486
+ ```text
487
+ Streamable HTTP: https://api.scrapi.tech/mcp
488
+ SSE: https://api.scrapi.tech/mcp/sse
489
+ ```
490
+
491
+ Cloud MCP servers are not yet supported by every MCP client. They are most useful for custom clients, MCP Inspector, or platforms that support remote MCP servers.
492
+
493
+ To authenticate with your ScrAPI API key, pass it as a query parameter or request header:
494
+
495
+ - Query parameter: `https://api.scrapi.tech/mcp?apiKey=<YOUR_API_KEY>`
496
+ - Request header: `X-API-KEY: <YOUR_API_KEY>`
497
+
498
+ ## Usage Examples
499
+
500
+ The exact interaction depends on your MCP client. In most clients, you can either ask the model to use the ScrAPI tool or call the tool directly from a tool inspector.
501
+
502
+ ### Scrape a Page as Markdown
503
+
504
+ Tool:
505
+
506
+ ```text
507
+ scrape_url_markdown
508
+ ```
509
+
510
+ Arguments:
511
+
512
+ ```json
513
+ {
514
+ "url": "https://example.com"
515
+ }
516
+ ```
517
+
518
+ Example prompt:
519
+
520
+ ```text
521
+ Use ScrAPI to scrape https://example.com as Markdown and summarize the page.
522
+ ```
523
+
524
+ ### Scrape a Page as HTML
525
+
526
+ Tool:
527
+
528
+ ```text
529
+ scrape_url_html
530
+ ```
531
+
532
+ Arguments:
533
+
534
+ ```json
535
+ {
536
+ "url": "https://example.com"
537
+ }
538
+ ```
539
+
540
+ Example prompt:
541
+
542
+ ```text
543
+ Use ScrAPI to scrape https://example.com as HTML and extract every link.
544
+ ```
545
+
546
+ ### Accept Cookies Before Scraping
547
+
548
+ The `browserCommands` value must be a string containing a JSON array.
549
+
550
+ ```json
551
+ {
552
+ "url": "https://example.com",
553
+ "browserCommands": "[{\"click\":\"#accept-cookies\"},{\"wait\":1000}]"
554
+ }
555
+ ```
556
+
557
+ ### Search a Site Before Scraping Results
558
+
559
+ ```json
560
+ {
561
+ "url": "https://example.com/search",
562
+ "browserCommands": "[{\"input\":{\"input[name='q']\":\"web scraping\"}},{\"click\":\"button[type='submit']\"},{\"waitfor\":\"#results\"}]"
563
+ }
564
+ ```
565
+
566
+ ### Load More Content
567
+
568
+ ```json
569
+ {
570
+ "url": "https://example.com/products",
571
+ "browserCommands": "[{\"scroll\":1200},{\"wait\":1000},{\"click\":\"button.load-more\"},{\"waitfor\":\".product-card:nth-child(25)\"}]"
572
+ }
573
+ ```
574
+
575
+ ## Browser Commands
576
+
577
+ Both tools support optional browser commands that interact with the page before ScrAPI captures the final result.
578
+
579
+ Commands are provided as a JSON array string. They are executed with human-like behavior such as random mouse movement and variable typing speed.
580
+
581
+ | Command | Format | Description |
582
+ | ------- | ------ | ----------- |
583
+ | Click | `{"click": "#buttonId"}` | Click an element by CSS selector. |
584
+ | Input | `{"input": {"input[name='email']": "value"}}` | Fill an input field. |
585
+ | Select | `{"select": {"select[name='country']": "USA"}}` | Select an option by value or visible text. |
586
+ | Scroll | `{"scroll": 1000}` | Scroll down by pixels. Use a negative value to scroll up. |
587
+ | Wait | `{"wait": 5000}` | Wait for milliseconds. Maximum: `15000`. |
588
+ | WaitFor | `{"waitfor": "#elementId"}` | Wait for an element to appear in the DOM. |
589
+ | JavaScript | `{"javascript": "console.log('test')"}` | Execute custom JavaScript. |
590
+
591
+ Readable command array:
592
+
593
+ ```json
594
+ [
595
+ { "click": "#accept-cookies" },
596
+ { "wait": 2000 },
597
+ { "input": { "input[name='search']": "web scraping" } },
598
+ { "click": "button[type='submit']" },
599
+ { "waitfor": "#results" },
600
+ { "scroll": 500 }
601
+ ]
602
+ ```
603
+
604
+ Escaped as an MCP tool argument:
605
+
606
+ ```json
607
+ {
608
+ "url": "https://example.com",
609
+ "browserCommands": "[{\"click\":\"#accept-cookies\"},{\"wait\":2000},{\"input\":{\"input[name='search']\":\"web scraping\"}},{\"click\":\"button[type='submit']\"},{\"waitfor\":\"#results\"},{\"scroll\":500}]"
610
+ }
611
+ ```
612
+
613
+ Need help finding CSS selectors? Try the [Rayrun browser extension](https://chromewebstore.google.com/detail/rayrun/olljocejdgeipcaompahmnfebhkfmnma) to select elements and generate selectors.
614
+
615
+ For more details, see the [Browser Commands documentation](https://scrapi.tech/docs/api_details/v1_scrape/browser_commands).
616
+
617
+ ## Troubleshooting
618
+
619
+ ### The MCP client cannot find the server
620
+
621
+ - Confirm Node.js 18 or newer is installed if using `npx`.
622
+ - Confirm Docker Desktop is running if using Docker.
623
+ - Restart the MCP client after editing its config file.
624
+ - Check that the configured command works in a terminal.
625
+
626
+ ### The tools appear, but scraping fails
627
+
628
+ - Confirm `SCRAPI_API_KEY` is set correctly.
629
+ - Try the same URL without `browserCommands`.
630
+ - Make sure `browserCommands` is a JSON array string, not a raw JSON array.
631
+ - Use `scrape_url_html` if Markdown extraction omits structure you need.
632
+ - Long-running pages, CAPTCHA flows, and heavy JavaScript pages can take several minutes.
633
+
634
+ ### Browser commands are ignored
635
+
636
+ The server only sends browser commands when `browserCommands` parses as a JSON array. This is valid:
637
+
638
+ ```json
639
+ {
640
+ "browserCommands": "[{\"click\":\"#accept-cookies\"}]"
641
+ }
642
+ ```
643
+
644
+ This is not valid for this MCP tool schema because it is an object array, not a string:
645
+
646
+ ```json
647
+ {
648
+ "browserCommands": [{ "click": "#accept-cookies" }]
649
+ }
650
+ ```
651
+
652
+ ### HTTP endpoint does not respond
653
+
654
+ - Confirm the server was started with `TRANSPORT=http`.
655
+ - Confirm the client connects to `/mcp`, not `/`.
656
+ - Confirm the port matches `PORT`.
657
+
658
+ ## Development
659
+
660
+ Install dependencies:
661
+
662
+ ```bash
663
+ npm install
664
+ ```
665
+
666
+ Run tests:
667
+
668
+ ```bash
669
+ npm test
670
+ ```
671
+
672
+ Build:
673
+
674
+ ```bash
675
+ npm run build
676
+ ```
677
+
678
+ Run from source in stdio mode:
679
+
680
+ ```bash
681
+ npm run build
682
+ node dist/index.js
683
+ ```
684
+
685
+ Run from source in HTTP mode:
686
+
687
+ ```bash
688
+ TRANSPORT=http PORT=5000 node dist/index.js
689
+ ```
690
+
691
+ Build the Docker image:
149
692
 
150
693
  ```bash
151
694
  docker build -t deventerprisesoftware/scrapi-mcp -f Dockerfile .
152
695
  ```
153
696
 
697
+ Or use the package script:
698
+
699
+ ```bash
700
+ npm run docker:build
701
+ ```
702
+
154
703
  ## License
155
704
 
156
- This MCP server is licensed under the MIT License. This means you are free to use, modify, and distribute the software, subject to the terms and conditions of the MIT License. For more details, please see the LICENSE file in the project repository.
705
+ This MCP server is licensed under the MIT License. You are free to use, modify, and distribute the software subject to the terms of the MIT License. See [LICENSE](LICENSE) for details.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@deventerprisesoftware/scrapi-mcp",
3
- "version": "0.4.0",
3
+ "version": "0.4.1",
4
4
  "description": "MCP server for using ScrAPI to scrape web pages.",
5
5
  "keywords": [
6
6
  "mcp",
@@ -56,12 +56,12 @@
56
56
  "@types/cors": "^2.8.19",
57
57
  "@types/express": "^5.0.6",
58
58
  "@types/node": "^25.9.1",
59
- "eslint": "^10.4.0",
59
+ "eslint": "^10.4.1",
60
60
  "eslint-config-prettier": "^10.1.8",
61
61
  "prettier": "^3.8.3",
62
62
  "shx": "^0.4.0",
63
63
  "typescript": "^6.0.3",
64
- "typescript-eslint": "^8.59.4",
65
- "vitest": "^4.1.7"
64
+ "typescript-eslint": "^8.60.1",
65
+ "vitest": "^4.1.8"
66
66
  }
67
67
  }