mcp-scraper 0.1.9 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +74 -8
- package/dist/bin/api-server.cjs +5615 -3733
- package/dist/bin/api-server.cjs.map +1 -1
- package/dist/bin/api-server.js +2 -2
- package/dist/bin/browser-agent-stdio-server.cjs +391 -0
- package/dist/bin/browser-agent-stdio-server.cjs.map +1 -0
- package/dist/bin/browser-agent-stdio-server.d.cts +1 -0
- package/dist/bin/browser-agent-stdio-server.d.ts +1 -0
- package/dist/bin/browser-agent-stdio-server.js +390 -0
- package/dist/bin/browser-agent-stdio-server.js.map +1 -0
- package/dist/bin/mcp-stdio-server.cjs +170 -12
- package/dist/bin/mcp-stdio-server.cjs.map +1 -1
- package/dist/bin/mcp-stdio-server.js +3 -2
- package/dist/bin/mcp-stdio-server.js.map +1 -1
- package/dist/bin/paa-harvest.cjs +223 -74
- package/dist/bin/paa-harvest.cjs.map +1 -1
- package/dist/bin/paa-harvest.js +2 -2
- package/dist/{chunk-ZK456YXN.js → chunk-IQOCZGJJ.js} +58 -4
- package/dist/chunk-IQOCZGJJ.js.map +1 -0
- package/dist/{chunk-ZMOWIBMK.js → chunk-M2S27J6Z.js} +9 -2
- package/dist/{chunk-ZMOWIBMK.js.map → chunk-M2S27J6Z.js.map} +1 -1
- package/dist/{chunk-TM22BLWP.js → chunk-MY3S7EX7.js} +221 -76
- package/dist/chunk-MY3S7EX7.js.map +1 -0
- package/dist/{chunk-JNC32DMS.js → chunk-OR7DLLH2.js} +175 -16
- package/dist/chunk-OR7DLLH2.js.map +1 -0
- package/dist/chunk-XR65SANX.js +7 -0
- package/dist/chunk-XR65SANX.js.map +1 -0
- package/dist/index.cjs +223 -74
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.cts +1 -0
- package/dist/index.d.ts +1 -0
- package/dist/index.js +2 -2
- package/dist/{server-MTXAJG5J.js → server-CJMX2QUM.js} +1655 -194
- package/dist/server-CJMX2QUM.js.map +1 -0
- package/dist/{worker-AUCXFHEL.js → worker-NAKGTIF5.js} +4 -4
- package/docs/specs/api-forge-spec.md +234 -0
- package/docs/specs/deferred-work-spec.md +74 -0
- package/docs/specs/oauth-mcp-spec.md +213 -0
- package/package.json +3 -2
- package/dist/chunk-JNC32DMS.js.map +0 -1
- package/dist/chunk-TM22BLWP.js.map +0 -1
- package/dist/chunk-ZK456YXN.js.map +0 -1
- package/dist/server-MTXAJG5J.js.map +0 -1
- /package/dist/{worker-AUCXFHEL.js.map → worker-NAKGTIF5.js.map} +0 -0
package/README.md
CHANGED
|
@@ -4,14 +4,26 @@ MCP Scraper is an MCP server for live web intelligence tools backed by `https://
|
|
|
4
4
|
|
|
5
5
|
## Install
|
|
6
6
|
|
|
7
|
-
Use the npm package from any MCP client that can run
|
|
7
|
+
Use the npm package from any MCP client that can run local stdio commands. MCP Scraper ships two separate local MCP servers:
|
|
8
|
+
|
|
9
|
+
- `mcp-scraper` — live web intelligence, SERP, PAA, site extraction, YouTube, Facebook, Maps, directory, and credit tools.
|
|
10
|
+
- `browser-agent` — an agent-controlled live cloud browser with screenshots, clicks, typing, scrolling, live watch URLs, replay links, and MP4 replay download.
|
|
11
|
+
|
|
12
|
+
Claude Desktop:
|
|
8
13
|
|
|
9
14
|
```json
|
|
10
15
|
{
|
|
11
16
|
"mcpServers": {
|
|
12
17
|
"mcp-scraper": {
|
|
13
18
|
"command": "npx",
|
|
14
|
-
|
|
19
|
+
"args": ["-y", "mcp-scraper@latest"],
|
|
20
|
+
"env": {
|
|
21
|
+
"MCP_SCRAPER_API_KEY": "sk_live_your_key"
|
|
22
|
+
}
|
|
23
|
+
},
|
|
24
|
+
"browser-agent": {
|
|
25
|
+
"command": "npx",
|
|
26
|
+
"args": ["-y", "-p", "mcp-scraper@latest", "browser-agent"],
|
|
15
27
|
"env": {
|
|
16
28
|
"MCP_SCRAPER_API_KEY": "sk_live_your_key"
|
|
17
29
|
}
|
|
@@ -20,12 +32,13 @@ Use the npm package from any MCP client that can run a local command:
|
|
|
20
32
|
}
|
|
21
33
|
```
|
|
22
34
|
|
|
23
|
-
Existing MCP configs that use `npx -y mcp-scraper` do not
|
|
35
|
+
Existing MCP configs that use only `npx -y mcp-scraper` still work for the web intelligence server, but they do not automatically add the `browser-agent` server. Add the second config entry if you want browser tools. Use `mcp-scraper@latest` to force npm to resolve the newest published package whenever the MCP client starts a fresh `npx` process.
|
|
24
36
|
|
|
25
37
|
Claude Code:
|
|
26
38
|
|
|
27
39
|
```bash
|
|
28
|
-
claude mcp add mcp-scraper --env MCP_SCRAPER_API_KEY=sk_live_your_key -- npx -y mcp-scraper
|
|
40
|
+
claude mcp add mcp-scraper --scope user --env MCP_SCRAPER_API_KEY=sk_live_your_key -- npx -y mcp-scraper@latest
|
|
41
|
+
claude mcp add browser-agent --scope user --env MCP_SCRAPER_API_KEY=sk_live_your_key -- npx -y -p mcp-scraper@latest browser-agent
|
|
29
42
|
```
|
|
30
43
|
|
|
31
44
|
Codex config:
|
|
@@ -33,12 +46,19 @@ Codex config:
|
|
|
33
46
|
```toml
|
|
34
47
|
[mcp_servers.mcp-scraper]
|
|
35
48
|
command = "npx"
|
|
36
|
-
args = ["-y", "mcp-scraper"]
|
|
49
|
+
args = ["-y", "mcp-scraper@latest"]
|
|
50
|
+
env = { MCP_SCRAPER_API_KEY = "sk_live_your_key" }
|
|
51
|
+
|
|
52
|
+
[mcp_servers.browser-agent]
|
|
53
|
+
command = "npx"
|
|
54
|
+
args = ["-y", "-p", "mcp-scraper@latest", "browser-agent"]
|
|
37
55
|
env = { MCP_SCRAPER_API_KEY = "sk_live_your_key" }
|
|
38
56
|
```
|
|
39
57
|
|
|
40
58
|
## Tools
|
|
41
59
|
|
|
60
|
+
### `mcp-scraper` stdio tools
|
|
61
|
+
|
|
42
62
|
- `harvest_paa`
|
|
43
63
|
- `search_serp`
|
|
44
64
|
- `extract_url`
|
|
@@ -51,15 +71,35 @@ env = { MCP_SCRAPER_API_KEY = "sk_live_your_key" }
|
|
|
51
71
|
- `facebook_ad_transcribe`
|
|
52
72
|
- `maps_search` — search Google Maps for multiple business/profile candidates. Use for GMB/GBP prospect lists, competitors, categories, and anything needing more than the Google 3-pack. `maxResults` defaults to 10 and is capped at 50.
|
|
53
73
|
- `maps_place_intel` — hydrate one known/named Google Maps business with profile details and optional reviews. Use after `maps_search` when a selected candidate needs full details.
|
|
74
|
+
- `directory_workflow` — build city-by-city directory/prospecting datasets from Census place selection plus Google Maps searches. Use it for requests like "all cities over 100k population in Tennessee, then get 20 roofers from Maps." The saved CSV includes `source_location`, `result_position`, `business_name`, `review_stars`, `category`, `address`, `phone`, `hours_status`, `website_url`, `directions_url`, `place_url`, `cid`, `cid_decimal`, Census population, and ZIP groups. It captures Maps star ratings from list cards, not profile review counts.
|
|
54
75
|
- `credits_info`
|
|
55
76
|
|
|
77
|
+
### `browser-agent` stdio tools
|
|
78
|
+
|
|
79
|
+
- `browser_open` — open a live cloud browser session. Returns a `session_id`, a human `watch_url`, and the raw `live_view_url` when available.
|
|
80
|
+
- `browser_screenshot` — capture a screenshot plus visible text and clickable element coordinates.
|
|
81
|
+
- `browser_read` — read the current page text and elements without an image.
|
|
82
|
+
- `browser_goto`
|
|
83
|
+
- `browser_click`
|
|
84
|
+
- `browser_type`
|
|
85
|
+
- `browser_scroll`
|
|
86
|
+
- `browser_press`
|
|
87
|
+
- `browser_replay_start` — start an MP4 replay. Returns `replay_id`, `view_url`, and `download_url` when available.
|
|
88
|
+
- `browser_replay_stop` — stop a replay. Returns the final `view_url` and `download_url`.
|
|
89
|
+
- `browser_list_replays` — list replay videos for a session.
|
|
90
|
+
- `browser_replay_download` — download and save the replay MP4 locally under `MCP_SCRAPER_OUTPUT_DIR/browser-replays`.
|
|
91
|
+
- `browser_close`
|
|
92
|
+
- `browser_list_sessions`
|
|
93
|
+
|
|
94
|
+
For US local SERP tools (`harvest_paa` and `search_serp`), keep `proxyMode` at the default `location` unless you are debugging. Location mode uses fresh residential proxy IDs across retries and treats CAPTCHA, proxy tunnel failure, and wrong-location evidence as retryable before returning.
|
|
95
|
+
|
|
56
96
|
Chaining tools (`maps_search`, `map_site_urls`, `youtube_harvest`, `facebook_ad_search`, `facebook_page_intel`) advertise an `outputSchema` and return `structuredContent` with the IDs and URLs needed by the next tool. All tools carry MCP annotations (`readOnlyHint: true`, `openWorldHint: true` for live-web tools).
|
|
57
97
|
|
|
58
|
-
The hosted MCP endpoint at `https://mcpscraper.dev/mcp` exposes
|
|
98
|
+
The hosted MCP endpoint at `https://mcpscraper.dev/mcp` exposes the 14 `mcp-scraper` tools plus `capture_serp_snapshot` and `capture_serp_page_snapshots` (16 total). The `browser-agent` server is currently a separate local stdio server; its REST backing API lives under `https://mcpscraper.dev/agent/*`.
|
|
59
99
|
|
|
60
100
|
## Resources
|
|
61
101
|
|
|
62
|
-
The NPX stdio server also exposes saved reports as MCP resources: `resources/list` returns the most recent Markdown reports from your output directory as `report://` URIs, and `resources/read` returns their content — so an MCP client can pull prior research into context without re-scraping or spending credits. The hosted endpoint does not expose resources (it saves no files).
|
|
102
|
+
The `mcp-scraper` NPX stdio server also exposes saved reports as MCP resources: `resources/list` returns the most recent Markdown reports from your output directory as `report://` URIs, and `resources/read` returns their content — so an MCP client can pull prior research into context without re-scraping or spending credits. The hosted endpoint does not expose resources (it saves no files).
|
|
63
103
|
|
|
64
104
|
## Environment
|
|
65
105
|
|
|
@@ -69,7 +109,33 @@ The NPX stdio server also exposes saved reports as MCP resources: `resources/lis
|
|
|
69
109
|
- `MCP_SCRAPER_SAVE_REPORTS=false` disables automatic Markdown report files.
|
|
70
110
|
- `MCP_SCRAPER_KEY_PATH` is optional. When no API key env var is set, the server also reads `~/.mcp-scraper-key` for compatibility with older installs.
|
|
71
111
|
|
|
72
|
-
Every tool call made through the NPX stdio server saves a full Markdown report locally by default and returns the file path in the MCP response. The hosted `/mcp` endpoint returns reports inline only and never writes files.
|
|
112
|
+
Every web intelligence tool call made through the `mcp-scraper` NPX stdio server saves a full Markdown report locally by default and returns the file path in the MCP response. The hosted `/mcp` endpoint returns reports inline only and never writes files. Browser replay downloads are saved by `browser_replay_download` under `MCP_SCRAPER_OUTPUT_DIR/browser-replays`.
|
|
113
|
+
|
|
114
|
+
## Updating Existing Installs
|
|
115
|
+
|
|
116
|
+
Hosted API and website changes deploy immediately to `https://mcpscraper.dev`. Local stdio MCP changes require publishing a new npm package version and restarting the MCP client. Running MCP server processes do not hot-update, and tool names/descriptions are loaded when the local server process starts.
|
|
117
|
+
|
|
118
|
+
Recommended config for update-friendly installs:
|
|
119
|
+
|
|
120
|
+
```bash
|
|
121
|
+
npx -y mcp-scraper@latest
|
|
122
|
+
npx -y -p mcp-scraper@latest browser-agent
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
If a user configured `mcp-scraper@0.2.0`, installed globally with `npm install -g mcp-scraper`, or installed it as a project dependency, they will stay on that version until they update the config or reinstall:
|
|
126
|
+
|
|
127
|
+
```bash
|
|
128
|
+
npm update -g mcp-scraper
|
|
129
|
+
npm install mcp-scraper@latest
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
Users who do not update can keep using the tools their local package already advertises, but they will not see newly added local stdio tools, schemas, or AI-facing descriptions. For example, a client running an older local package cannot call `directory_workflow` through stdio even if the hosted API already supports it. Users who configured only `mcp-scraper` must add `browser-agent` separately; MCP clients do not auto-create a second server entry from an existing config.
|
|
133
|
+
|
|
134
|
+
## Branded One-Click Installs
|
|
135
|
+
|
|
136
|
+
Raw `npx` MCP installs are command/config based. They do not provide a reliable user-facing install card, logo, or setup screen inside MCP clients. Do not print marketing text to stdout from an MCP server; stdout is reserved for JSON-RPC protocol messages.
|
|
137
|
+
|
|
138
|
+
For a branded Claude Desktop install, package MCP Scraper as an MCPB Desktop Extension. An MCPB bundle can include a `manifest.json`, bundled server files/dependencies, `user_config` fields for API-key setup, and an optional `icon.png`. That is the right path for a designed install experience with a logo and guided configuration.
|
|
73
139
|
|
|
74
140
|
## Development
|
|
75
141
|
|