firecrawl-mcp 3.3.5 → 3.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +51 -36
- package/dist/index.js +87 -81
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -12,7 +12,6 @@ A Model Context Protocol (MCP) server implementation that integrates with [Firec
|
|
|
12
12
|
|
|
13
13
|
> Big thanks to [@vrknetha](https://github.com/vrknetha), [@knacklabs](https://www.knacklabs.ai) for the initial implementation!
|
|
14
14
|
|
|
15
|
-
|
|
16
15
|
## Features
|
|
17
16
|
|
|
18
17
|
- Web scraping, crawling, and discovery
|
|
@@ -21,25 +20,6 @@ A Model Context Protocol (MCP) server implementation that integrates with [Firec
|
|
|
21
20
|
- Automatic retries and rate limiting
|
|
22
21
|
- Cloud and self-hosted support
|
|
23
22
|
- SSE support
|
|
24
|
-
- **Context limit support for MCP compatibility**
|
|
25
|
-
|
|
26
|
-
## Context Limiting for MCP
|
|
27
|
-
|
|
28
|
-
All tools now support the `maxResponseSize` parameter to limit response size for better MCP compatibility. This is especially useful for large responses that may exceed MCP context limits.
|
|
29
|
-
|
|
30
|
-
**Example Usage:**
|
|
31
|
-
```json
|
|
32
|
-
{
|
|
33
|
-
"name": "firecrawl_scrape",
|
|
34
|
-
"arguments": {
|
|
35
|
-
"url": "https://example.com",
|
|
36
|
-
"formats": ["markdown"],
|
|
37
|
-
"maxResponseSize": 50000
|
|
38
|
-
}
|
|
39
|
-
}
|
|
40
|
-
```
|
|
41
|
-
|
|
42
|
-
When the response exceeds the specified limit, content will be truncated with a clear message indicating truncation occurred. This parameter is optional and preserves full backward compatibility.
|
|
43
23
|
|
|
44
24
|
> Play around with [our MCP Server on MCP.so's playground](https://mcp.so/playground?server=firecrawl-mcp-server) or on [Klavis AI](https://www.klavis.ai/mcp-servers).
|
|
45
25
|
|
|
@@ -83,7 +63,7 @@ To configure Firecrawl MCP in Cursor **v0.48.6**
|
|
|
83
63
|
}
|
|
84
64
|
}
|
|
85
65
|
```
|
|
86
|
-
|
|
66
|
+
|
|
87
67
|
To configure Firecrawl MCP in Cursor **v0.45.6**
|
|
88
68
|
|
|
89
69
|
1. Open Cursor Settings
|
|
@@ -94,8 +74,6 @@ To configure Firecrawl MCP in Cursor **v0.45.6**
|
|
|
94
74
|
- Type: "command"
|
|
95
75
|
- Command: `env FIRECRAWL_API_KEY=your-api-key npx -y firecrawl-mcp`
|
|
96
76
|
|
|
97
|
-
|
|
98
|
-
|
|
99
77
|
> If you are using Windows and are running into issues, try `cmd /c "set FIRECRAWL_API_KEY=your-api-key && npx -y firecrawl-mcp"`
|
|
100
78
|
|
|
101
79
|
Replace `your-api-key` with your Firecrawl API key. If you don't have one yet, you can create an account and get it from https://www.firecrawl.dev/app/api-keys
|
|
@@ -120,15 +98,15 @@ Add this to your `./codeium/windsurf/model_config.json`:
|
|
|
120
98
|
}
|
|
121
99
|
```
|
|
122
100
|
|
|
123
|
-
### Running with
|
|
101
|
+
### Running with Streamable HTTP Local Mode
|
|
124
102
|
|
|
125
|
-
To run the server using
|
|
103
|
+
To run the server using Streamable HTTP locally instead of the default stdio transport:
|
|
126
104
|
|
|
127
105
|
```bash
|
|
128
|
-
env
|
|
106
|
+
env HTTP_STREAMABLE_SERVER=true FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp
|
|
129
107
|
```
|
|
130
108
|
|
|
131
|
-
Use the url: http://localhost:3000/
|
|
109
|
+
Use the url: http://localhost:3000/mcp
|
|
132
110
|
|
|
133
111
|
### Installing via Smithery (Legacy)
|
|
134
112
|
|
|
@@ -341,14 +319,14 @@ Use this guide to select the right tool for your task:
|
|
|
341
319
|
|
|
342
320
|
### Quick Reference Table
|
|
343
321
|
|
|
344
|
-
| Tool
|
|
345
|
-
|
|
346
|
-
| scrape
|
|
347
|
-
| batch_scrape
|
|
348
|
-
| map
|
|
349
|
-
| crawl
|
|
350
|
-
| search
|
|
351
|
-
| extract
|
|
322
|
+
| Tool | Best for | Returns |
|
|
323
|
+
| ------------ | ----------------------------------- | --------------- |
|
|
324
|
+
| scrape | Single page content | markdown/html |
|
|
325
|
+
| batch_scrape | Multiple known URLs | markdown/html[] |
|
|
326
|
+
| map | Discovering URLs on a site | URL[] |
|
|
327
|
+
| crawl | Multi-page extraction (with limits) | markdown/html[] |
|
|
328
|
+
| search | Web search for info | results[] |
|
|
329
|
+
| extract | Structured data from pages | JSON |
|
|
352
330
|
|
|
353
331
|
## Available Tools
|
|
354
332
|
|
|
@@ -357,20 +335,25 @@ Use this guide to select the right tool for your task:
|
|
|
357
335
|
Scrape content from a single URL with advanced options.
|
|
358
336
|
|
|
359
337
|
**Best for:**
|
|
338
|
+
|
|
360
339
|
- Single page content extraction, when you know exactly which page contains the information.
|
|
361
340
|
|
|
362
341
|
**Not recommended for:**
|
|
342
|
+
|
|
363
343
|
- Extracting content from multiple pages (use batch_scrape for known URLs, or map + batch_scrape to discover URLs first, or crawl for full page content)
|
|
364
344
|
- When you're unsure which page contains the information (use search)
|
|
365
345
|
- When you need structured data (use extract)
|
|
366
346
|
|
|
367
347
|
**Common mistakes:**
|
|
348
|
+
|
|
368
349
|
- Using scrape for a list of URLs (use batch_scrape instead).
|
|
369
350
|
|
|
370
351
|
**Prompt Example:**
|
|
352
|
+
|
|
371
353
|
> "Get the content of the page at https://example.com."
|
|
372
354
|
|
|
373
355
|
**Usage Example:**
|
|
356
|
+
|
|
374
357
|
```json
|
|
375
358
|
{
|
|
376
359
|
"name": "firecrawl_scrape",
|
|
@@ -389,6 +372,7 @@ Scrape content from a single URL with advanced options.
|
|
|
389
372
|
```
|
|
390
373
|
|
|
391
374
|
**Returns:**
|
|
375
|
+
|
|
392
376
|
- Markdown, HTML, or other formats as specified.
|
|
393
377
|
|
|
394
378
|
### 2. Batch Scrape Tool (`firecrawl_batch_scrape`)
|
|
@@ -396,19 +380,24 @@ Scrape content from a single URL with advanced options.
|
|
|
396
380
|
Scrape multiple URLs efficiently with built-in rate limiting and parallel processing.
|
|
397
381
|
|
|
398
382
|
**Best for:**
|
|
383
|
+
|
|
399
384
|
- Retrieving content from multiple pages, when you know exactly which pages to scrape.
|
|
400
385
|
|
|
401
386
|
**Not recommended for:**
|
|
387
|
+
|
|
402
388
|
- Discovering URLs (use map first if you don't know the URLs)
|
|
403
389
|
- Scraping a single page (use scrape)
|
|
404
390
|
|
|
405
391
|
**Common mistakes:**
|
|
392
|
+
|
|
406
393
|
- Using batch_scrape with too many URLs at once (may hit rate limits or token overflow)
|
|
407
394
|
|
|
408
395
|
**Prompt Example:**
|
|
396
|
+
|
|
409
397
|
> "Get the content of these three blog posts: [url1, url2, url3]."
|
|
410
398
|
|
|
411
399
|
**Usage Example:**
|
|
400
|
+
|
|
412
401
|
```json
|
|
413
402
|
{
|
|
414
403
|
"name": "firecrawl_batch_scrape",
|
|
@@ -423,6 +412,7 @@ Scrape multiple URLs efficiently with built-in rate limiting and parallel proces
|
|
|
423
412
|
```
|
|
424
413
|
|
|
425
414
|
**Returns:**
|
|
415
|
+
|
|
426
416
|
- Response includes operation ID for status checking:
|
|
427
417
|
|
|
428
418
|
```json
|
|
@@ -455,20 +445,25 @@ Check the status of a batch operation.
|
|
|
455
445
|
Map a website to discover all indexed URLs on the site.
|
|
456
446
|
|
|
457
447
|
**Best for:**
|
|
448
|
+
|
|
458
449
|
- Discovering URLs on a website before deciding what to scrape
|
|
459
450
|
- Finding specific sections of a website
|
|
460
451
|
|
|
461
452
|
**Not recommended for:**
|
|
453
|
+
|
|
462
454
|
- When you already know which specific URL you need (use scrape or batch_scrape)
|
|
463
455
|
- When you need the content of the pages (use scrape after mapping)
|
|
464
456
|
|
|
465
457
|
**Common mistakes:**
|
|
458
|
+
|
|
466
459
|
- Using crawl to discover URLs instead of map
|
|
467
460
|
|
|
468
461
|
**Prompt Example:**
|
|
462
|
+
|
|
469
463
|
> "List all URLs on example.com."
|
|
470
464
|
|
|
471
465
|
**Usage Example:**
|
|
466
|
+
|
|
472
467
|
```json
|
|
473
468
|
{
|
|
474
469
|
"name": "firecrawl_map",
|
|
@@ -479,6 +474,7 @@ Map a website to discover all indexed URLs on the site.
|
|
|
479
474
|
```
|
|
480
475
|
|
|
481
476
|
**Returns:**
|
|
477
|
+
|
|
482
478
|
- Array of URLs found on the site
|
|
483
479
|
|
|
484
480
|
### 5. Search Tool (`firecrawl_search`)
|
|
@@ -486,17 +482,21 @@ Map a website to discover all indexed URLs on the site.
|
|
|
486
482
|
Search the web and optionally extract content from search results.
|
|
487
483
|
|
|
488
484
|
**Best for:**
|
|
485
|
+
|
|
489
486
|
- Finding specific information across multiple websites, when you don't know which website has the information.
|
|
490
487
|
- When you need the most relevant content for a query
|
|
491
488
|
|
|
492
489
|
**Not recommended for:**
|
|
490
|
+
|
|
493
491
|
- When you already know which website to scrape (use scrape)
|
|
494
492
|
- When you need comprehensive coverage of a single website (use map or crawl)
|
|
495
493
|
|
|
496
494
|
**Common mistakes:**
|
|
495
|
+
|
|
497
496
|
- Using crawl or map for open-ended questions (use search instead)
|
|
498
497
|
|
|
499
498
|
**Usage Example:**
|
|
499
|
+
|
|
500
500
|
```json
|
|
501
501
|
{
|
|
502
502
|
"name": "firecrawl_search",
|
|
@@ -514,9 +514,11 @@ Search the web and optionally extract content from search results.
|
|
|
514
514
|
```
|
|
515
515
|
|
|
516
516
|
**Returns:**
|
|
517
|
+
|
|
517
518
|
- Array of search results (with optional scraped content)
|
|
518
519
|
|
|
519
520
|
**Prompt Example:**
|
|
521
|
+
|
|
520
522
|
> "Find the latest research papers on AI published in 2023."
|
|
521
523
|
|
|
522
524
|
### 6. Crawl Tool (`firecrawl_crawl`)
|
|
@@ -524,9 +526,11 @@ Search the web and optionally extract content from search results.
|
|
|
524
526
|
Starts an asynchronous crawl job on a website and extract content from all pages.
|
|
525
527
|
|
|
526
528
|
**Best for:**
|
|
529
|
+
|
|
527
530
|
- Extracting content from multiple related pages, when you need comprehensive coverage.
|
|
528
531
|
|
|
529
532
|
**Not recommended for:**
|
|
533
|
+
|
|
530
534
|
- Extracting content from a single page (use scrape)
|
|
531
535
|
- When token limits are a concern (use map + batch_scrape)
|
|
532
536
|
- When you need fast results (crawling can be slow)
|
|
@@ -534,13 +538,16 @@ Starts an asynchronous crawl job on a website and extract content from all pages
|
|
|
534
538
|
**Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control.
|
|
535
539
|
|
|
536
540
|
**Common mistakes:**
|
|
541
|
+
|
|
537
542
|
- Setting limit or maxDepth too high (causes token overflow)
|
|
538
543
|
- Using crawl for a single page (use scrape instead)
|
|
539
544
|
|
|
540
545
|
**Prompt Example:**
|
|
546
|
+
|
|
541
547
|
> "Get all blog posts from the first two levels of example.com/blog."
|
|
542
548
|
|
|
543
549
|
**Usage Example:**
|
|
550
|
+
|
|
544
551
|
```json
|
|
545
552
|
{
|
|
546
553
|
"name": "firecrawl_crawl",
|
|
@@ -555,6 +562,7 @@ Starts an asynchronous crawl job on a website and extract content from all pages
|
|
|
555
562
|
```
|
|
556
563
|
|
|
557
564
|
**Returns:**
|
|
565
|
+
|
|
558
566
|
- Response includes operation ID for status checking:
|
|
559
567
|
|
|
560
568
|
```json
|
|
@@ -583,20 +591,24 @@ Check the status of a crawl job.
|
|
|
583
591
|
```
|
|
584
592
|
|
|
585
593
|
**Returns:**
|
|
594
|
+
|
|
586
595
|
- Response includes the status of the crawl job:
|
|
587
|
-
|
|
596
|
+
|
|
588
597
|
### 8. Extract Tool (`firecrawl_extract`)
|
|
589
598
|
|
|
590
599
|
Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.
|
|
591
600
|
|
|
592
601
|
**Best for:**
|
|
602
|
+
|
|
593
603
|
- Extracting specific structured data like prices, names, details.
|
|
594
604
|
|
|
595
605
|
**Not recommended for:**
|
|
606
|
+
|
|
596
607
|
- When you need the full content of a page (use scrape)
|
|
597
608
|
- When you're not looking for specific structured data
|
|
598
609
|
|
|
599
610
|
**Arguments:**
|
|
611
|
+
|
|
600
612
|
- `urls`: Array of URLs to extract information from
|
|
601
613
|
- `prompt`: Custom prompt for the LLM extraction
|
|
602
614
|
- `systemPrompt`: System prompt to guide the LLM
|
|
@@ -607,9 +619,11 @@ Extract structured information from web pages using LLM capabilities. Supports b
|
|
|
607
619
|
|
|
608
620
|
When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses Firecrawl's managed LLM service.
|
|
609
621
|
**Prompt Example:**
|
|
622
|
+
|
|
610
623
|
> "Extract the product name, price, and description from these product pages."
|
|
611
624
|
|
|
612
625
|
**Usage Example:**
|
|
626
|
+
|
|
613
627
|
```json
|
|
614
628
|
{
|
|
615
629
|
"name": "firecrawl_extract",
|
|
@@ -634,6 +648,7 @@ When using a self-hosted instance, the extraction will use your configured LLM.
|
|
|
634
648
|
```
|
|
635
649
|
|
|
636
650
|
**Returns:**
|
|
651
|
+
|
|
637
652
|
- Extracted structured data as defined by your schema
|
|
638
653
|
|
|
639
654
|
```json
|
package/dist/index.js
CHANGED
|
@@ -36,9 +36,9 @@ function removeEmptyTopLevel(obj) {
|
|
|
36
36
|
return out;
|
|
37
37
|
}
|
|
38
38
|
class ConsoleLogger {
|
|
39
|
-
shouldLog =
|
|
39
|
+
shouldLog = process.env.CLOUD_SERVICE === 'true' ||
|
|
40
40
|
process.env.SSE_LOCAL === 'true' ||
|
|
41
|
-
process.env.HTTP_STREAMABLE_SERVER === 'true'
|
|
41
|
+
process.env.HTTP_STREAMABLE_SERVER === 'true';
|
|
42
42
|
debug(...args) {
|
|
43
43
|
if (this.shouldLog) {
|
|
44
44
|
console.debug('[DEBUG]', new Date().toISOString(), ...args);
|
|
@@ -119,24 +119,26 @@ function getClient(session) {
|
|
|
119
119
|
return createClient(session.firecrawlApiKey);
|
|
120
120
|
}
|
|
121
121
|
// For self-hosted instances, API key is optional if FIRECRAWL_API_URL is provided
|
|
122
|
-
if (!process.env.FIRECRAWL_API_URL &&
|
|
122
|
+
if (!process.env.FIRECRAWL_API_URL &&
|
|
123
|
+
(!session || !session.firecrawlApiKey)) {
|
|
123
124
|
throw new Error('Unauthorized: API key is required when not using a self-hosted instance');
|
|
124
125
|
}
|
|
125
126
|
return createClient(session?.firecrawlApiKey);
|
|
126
127
|
}
|
|
127
|
-
function asText(data
|
|
128
|
-
|
|
129
|
-
if (maxResponseSize && maxResponseSize > 0 && text.length > maxResponseSize) {
|
|
130
|
-
const truncatedText = text.substring(0, maxResponseSize - 100); // Reserve space for truncation message
|
|
131
|
-
return truncatedText + '\n\n[Content truncated due to size limit. Increase maxResponseSize parameter to see full content.]';
|
|
132
|
-
}
|
|
133
|
-
return text;
|
|
128
|
+
function asText(data) {
|
|
129
|
+
return JSON.stringify(data, null, 2);
|
|
134
130
|
}
|
|
135
131
|
// scrape tool (v2 semantics, minimal args)
|
|
136
132
|
// Centralized scrape params (used by scrape, and referenced in search/crawl scrapeOptions)
|
|
137
133
|
// Define safe action types
|
|
138
134
|
const safeActionTypes = ['wait', 'screenshot', 'scroll', 'scrape'];
|
|
139
|
-
const otherActions = [
|
|
135
|
+
const otherActions = [
|
|
136
|
+
'click',
|
|
137
|
+
'write',
|
|
138
|
+
'press',
|
|
139
|
+
'executeJavascript',
|
|
140
|
+
'generatePDF',
|
|
141
|
+
];
|
|
140
142
|
const allActionTypes = [...safeActionTypes, ...otherActions];
|
|
141
143
|
// Use appropriate action types based on safe mode
|
|
142
144
|
const allowedActionTypes = SAFE_MODE ? safeActionTypes : allActionTypes;
|
|
@@ -168,24 +170,35 @@ const scrapeParamsSchema = z.object({
|
|
|
168
170
|
}),
|
|
169
171
|
]))
|
|
170
172
|
.optional(),
|
|
173
|
+
parsers: z
|
|
174
|
+
.array(z.union([
|
|
175
|
+
z.enum(['pdf']),
|
|
176
|
+
z.object({
|
|
177
|
+
type: z.enum(['pdf']),
|
|
178
|
+
maxPages: z.number().int().min(1).max(10000).optional(),
|
|
179
|
+
}),
|
|
180
|
+
]))
|
|
181
|
+
.optional(),
|
|
171
182
|
onlyMainContent: z.boolean().optional(),
|
|
172
183
|
includeTags: z.array(z.string()).optional(),
|
|
173
184
|
excludeTags: z.array(z.string()).optional(),
|
|
174
185
|
waitFor: z.number().optional(),
|
|
175
|
-
...(SAFE_MODE
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
186
|
+
...(SAFE_MODE
|
|
187
|
+
? {}
|
|
188
|
+
: {
|
|
189
|
+
actions: z
|
|
190
|
+
.array(z.object({
|
|
191
|
+
type: z.enum(allowedActionTypes),
|
|
192
|
+
selector: z.string().optional(),
|
|
193
|
+
milliseconds: z.number().optional(),
|
|
194
|
+
text: z.string().optional(),
|
|
195
|
+
key: z.string().optional(),
|
|
196
|
+
direction: z.enum(['up', 'down']).optional(),
|
|
197
|
+
script: z.string().optional(),
|
|
198
|
+
fullPage: z.boolean().optional(),
|
|
199
|
+
}))
|
|
200
|
+
.optional(),
|
|
201
|
+
}),
|
|
189
202
|
mobile: z.boolean().optional(),
|
|
190
203
|
skipTlsVerification: z.boolean().optional(),
|
|
191
204
|
removeBase64Images: z.boolean().optional(),
|
|
@@ -197,12 +210,11 @@ const scrapeParamsSchema = z.object({
|
|
|
197
210
|
.optional(),
|
|
198
211
|
storeInCache: z.boolean().optional(),
|
|
199
212
|
maxAge: z.number().optional(),
|
|
200
|
-
maxResponseSize: z.number().optional(),
|
|
201
213
|
});
|
|
202
214
|
server.addTool({
|
|
203
215
|
name: 'firecrawl_scrape',
|
|
204
216
|
description: `
|
|
205
|
-
Scrape content from a single URL with advanced options.
|
|
217
|
+
Scrape content from a single URL with advanced options.
|
|
206
218
|
This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs.
|
|
207
219
|
|
|
208
220
|
**Best for:** Single page content extraction, when you know exactly which page contains the information.
|
|
@@ -216,24 +228,27 @@ This is the most powerful, fastest and most reliable scraper tool, if available
|
|
|
216
228
|
"arguments": {
|
|
217
229
|
"url": "https://example.com",
|
|
218
230
|
"formats": ["markdown"],
|
|
219
|
-
"maxAge": 172800000
|
|
220
|
-
"maxResponseSize": 50000
|
|
231
|
+
"maxAge": 172800000
|
|
221
232
|
}
|
|
222
233
|
}
|
|
223
234
|
\`\`\`
|
|
224
235
|
**Performance:** Add maxAge parameter for 500% faster scrapes using cached data.
|
|
225
|
-
**Context Limiting:** Use maxResponseSize parameter to limit response size for MCP compatibility (e.g., 50000 characters).
|
|
226
236
|
**Returns:** Markdown, HTML, or other formats as specified.
|
|
227
|
-
${SAFE_MODE
|
|
237
|
+
${SAFE_MODE
|
|
238
|
+
? '**Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.'
|
|
239
|
+
: ''}
|
|
228
240
|
`,
|
|
229
241
|
parameters: scrapeParamsSchema,
|
|
230
242
|
execute: async (args, { session, log }) => {
|
|
231
|
-
const { url,
|
|
243
|
+
const { url, ...options } = args;
|
|
232
244
|
const client = getClient(session);
|
|
233
245
|
const cleaned = removeEmptyTopLevel(options);
|
|
234
246
|
log.info('Scraping URL', { url: String(url) });
|
|
235
|
-
const res = await client.scrape(String(url), {
|
|
236
|
-
|
|
247
|
+
const res = await client.scrape(String(url), {
|
|
248
|
+
...cleaned,
|
|
249
|
+
origin: ORIGIN,
|
|
250
|
+
});
|
|
251
|
+
return asText(res);
|
|
237
252
|
},
|
|
238
253
|
});
|
|
239
254
|
server.addTool({
|
|
@@ -244,15 +259,13 @@ Map a website to discover all indexed URLs on the site.
|
|
|
244
259
|
**Best for:** Discovering URLs on a website before deciding what to scrape; finding specific sections of a website.
|
|
245
260
|
**Not recommended for:** When you already know which specific URL you need (use scrape or batch_scrape); when you need the content of the pages (use scrape after mapping).
|
|
246
261
|
**Common mistakes:** Using crawl to discover URLs instead of map.
|
|
247
|
-
**Context Limiting:** Use maxResponseSize parameter to limit response size for MCP compatibility.
|
|
248
262
|
**Prompt Example:** "List all URLs on example.com."
|
|
249
263
|
**Usage Example:**
|
|
250
264
|
\`\`\`json
|
|
251
265
|
{
|
|
252
266
|
"name": "firecrawl_map",
|
|
253
267
|
"arguments": {
|
|
254
|
-
"url": "https://example.com"
|
|
255
|
-
"maxResponseSize": 50000
|
|
268
|
+
"url": "https://example.com"
|
|
256
269
|
}
|
|
257
270
|
}
|
|
258
271
|
\`\`\`
|
|
@@ -265,15 +278,17 @@ Map a website to discover all indexed URLs on the site.
|
|
|
265
278
|
includeSubdomains: z.boolean().optional(),
|
|
266
279
|
limit: z.number().optional(),
|
|
267
280
|
ignoreQueryParameters: z.boolean().optional(),
|
|
268
|
-
maxResponseSize: z.number().optional(),
|
|
269
281
|
}),
|
|
270
282
|
execute: async (args, { session, log }) => {
|
|
271
|
-
const { url,
|
|
283
|
+
const { url, ...options } = args;
|
|
272
284
|
const client = getClient(session);
|
|
273
285
|
const cleaned = removeEmptyTopLevel(options);
|
|
274
286
|
log.info('Mapping URL', { url: String(url) });
|
|
275
|
-
const res = await client.map(String(url), {
|
|
276
|
-
|
|
287
|
+
const res = await client.map(String(url), {
|
|
288
|
+
...cleaned,
|
|
289
|
+
origin: ORIGIN,
|
|
290
|
+
});
|
|
291
|
+
return asText(res);
|
|
277
292
|
},
|
|
278
293
|
});
|
|
279
294
|
server.addTool({
|
|
@@ -301,7 +316,9 @@ The query also supports search operators, that you can use if needed to refine t
|
|
|
301
316
|
**Prompt Example:** "Find the latest research papers on AI published in 2023."
|
|
302
317
|
**Sources:** web, images, news, default to web unless needed images or news.
|
|
303
318
|
**Scrape Options:** Only use scrapeOptions when you think it is absolutely necessary. When you do so default to a lower limit to avoid timeouts, 5 or lower.
|
|
304
|
-
**
|
|
319
|
+
**Optimal Workflow:** Search first using firecrawl_search without formats, then after fetching the results, use the scrape tool to get the content of the relevantpage(s) that you want to scrape
|
|
320
|
+
|
|
321
|
+
**Usage Example without formats (Preferred):**
|
|
305
322
|
\`\`\`json
|
|
306
323
|
{
|
|
307
324
|
"name": "firecrawl_search",
|
|
@@ -331,12 +348,10 @@ The query also supports search operators, that you can use if needed to refine t
|
|
|
331
348
|
"scrapeOptions": {
|
|
332
349
|
"formats": ["markdown"],
|
|
333
350
|
"onlyMainContent": true
|
|
334
|
-
}
|
|
335
|
-
"maxResponseSize": 50000
|
|
351
|
+
}
|
|
336
352
|
}
|
|
337
353
|
}
|
|
338
354
|
\`\`\`
|
|
339
|
-
**Context Limiting:** Use maxResponseSize parameter to limit response size for MCP compatibility.
|
|
340
355
|
**Returns:** Array of search results (with optional scraped content).
|
|
341
356
|
`,
|
|
342
357
|
parameters: z.object({
|
|
@@ -349,18 +364,17 @@ The query also supports search operators, that you can use if needed to refine t
|
|
|
349
364
|
.array(z.object({ type: z.enum(['web', 'images', 'news']) }))
|
|
350
365
|
.optional(),
|
|
351
366
|
scrapeOptions: scrapeParamsSchema.omit({ url: true }).partial().optional(),
|
|
352
|
-
maxResponseSize: z.number().optional(),
|
|
353
367
|
}),
|
|
354
368
|
execute: async (args, { session, log }) => {
|
|
355
369
|
const client = getClient(session);
|
|
356
|
-
const { query,
|
|
370
|
+
const { query, ...opts } = args;
|
|
357
371
|
const cleaned = removeEmptyTopLevel(opts);
|
|
358
372
|
log.info('Searching', { query: String(query) });
|
|
359
373
|
const res = await client.search(query, {
|
|
360
374
|
...cleaned,
|
|
361
375
|
origin: ORIGIN,
|
|
362
376
|
});
|
|
363
|
-
return asText(res
|
|
377
|
+
return asText(res);
|
|
364
378
|
},
|
|
365
379
|
});
|
|
366
380
|
server.addTool({
|
|
@@ -383,14 +397,14 @@ server.addTool({
|
|
|
383
397
|
"limit": 20,
|
|
384
398
|
"allowExternalLinks": false,
|
|
385
399
|
"deduplicateSimilarURLs": true,
|
|
386
|
-
"sitemap": "include"
|
|
387
|
-
"maxResponseSize": 50000
|
|
400
|
+
"sitemap": "include"
|
|
388
401
|
}
|
|
389
402
|
}
|
|
390
403
|
\`\`\`
|
|
391
|
-
**Context Limiting:** Use maxResponseSize parameter to limit response size for MCP compatibility.
|
|
392
404
|
**Returns:** Operation ID for status checking; use firecrawl_check_crawl_status to check progress.
|
|
393
|
-
${SAFE_MODE
|
|
405
|
+
${SAFE_MODE
|
|
406
|
+
? '**Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.'
|
|
407
|
+
: ''}
|
|
394
408
|
`,
|
|
395
409
|
parameters: z.object({
|
|
396
410
|
url: z.string(),
|
|
@@ -405,24 +419,25 @@ server.addTool({
|
|
|
405
419
|
crawlEntireDomain: z.boolean().optional(),
|
|
406
420
|
delay: z.number().optional(),
|
|
407
421
|
maxConcurrency: z.number().optional(),
|
|
408
|
-
...(SAFE_MODE
|
|
409
|
-
|
|
410
|
-
|
|
411
|
-
z
|
|
412
|
-
|
|
413
|
-
|
|
414
|
-
|
|
415
|
-
|
|
416
|
-
|
|
417
|
-
|
|
418
|
-
|
|
422
|
+
...(SAFE_MODE
|
|
423
|
+
? {}
|
|
424
|
+
: {
|
|
425
|
+
webhook: z
|
|
426
|
+
.union([
|
|
427
|
+
z.string(),
|
|
428
|
+
z.object({
|
|
429
|
+
url: z.string(),
|
|
430
|
+
headers: z.record(z.string(), z.string()).optional(),
|
|
431
|
+
}),
|
|
432
|
+
])
|
|
433
|
+
.optional(),
|
|
434
|
+
}),
|
|
419
435
|
deduplicateSimilarURLs: z.boolean().optional(),
|
|
420
436
|
ignoreQueryParameters: z.boolean().optional(),
|
|
421
437
|
scrapeOptions: scrapeParamsSchema.omit({ url: true }).partial().optional(),
|
|
422
|
-
maxResponseSize: z.number().optional(),
|
|
423
438
|
}),
|
|
424
439
|
execute: async (args, { session, log }) => {
|
|
425
|
-
const { url,
|
|
440
|
+
const { url, ...options } = args;
|
|
426
441
|
const client = getClient(session);
|
|
427
442
|
const cleaned = removeEmptyTopLevel(options);
|
|
428
443
|
log.info('Starting crawl', { url: String(url) });
|
|
@@ -430,7 +445,7 @@ server.addTool({
|
|
|
430
445
|
...cleaned,
|
|
431
446
|
origin: ORIGIN,
|
|
432
447
|
});
|
|
433
|
-
return asText(res
|
|
448
|
+
return asText(res);
|
|
434
449
|
},
|
|
435
450
|
});
|
|
436
451
|
server.addTool({
|
|
@@ -443,23 +458,17 @@ Check the status of a crawl job.
|
|
|
443
458
|
{
|
|
444
459
|
"name": "firecrawl_check_crawl_status",
|
|
445
460
|
"arguments": {
|
|
446
|
-
"id": "550e8400-e29b-41d4-a716-446655440000"
|
|
447
|
-
"maxResponseSize": 50000
|
|
461
|
+
"id": "550e8400-e29b-41d4-a716-446655440000"
|
|
448
462
|
}
|
|
449
463
|
}
|
|
450
464
|
\`\`\`
|
|
451
|
-
**Context Limiting:** Use maxResponseSize parameter to limit response size for MCP compatibility.
|
|
452
465
|
**Returns:** Status and progress of the crawl job, including results if available.
|
|
453
466
|
`,
|
|
454
|
-
parameters: z.object({
|
|
455
|
-
id: z.string(),
|
|
456
|
-
maxResponseSize: z.number().optional(),
|
|
457
|
-
}),
|
|
467
|
+
parameters: z.object({ id: z.string() }),
|
|
458
468
|
execute: async (args, { session }) => {
|
|
459
|
-
const { id, maxResponseSize } = args;
|
|
460
469
|
const client = getClient(session);
|
|
461
|
-
const res = await client.getCrawlStatus(id);
|
|
462
|
-
return asText(res
|
|
470
|
+
const res = await client.getCrawlStatus(args.id);
|
|
471
|
+
return asText(res);
|
|
463
472
|
},
|
|
464
473
|
});
|
|
465
474
|
server.addTool({
|
|
@@ -495,12 +504,10 @@ Extract structured information from web pages using LLM capabilities. Supports b
|
|
|
495
504
|
},
|
|
496
505
|
"allowExternalLinks": false,
|
|
497
506
|
"enableWebSearch": false,
|
|
498
|
-
"includeSubdomains": false
|
|
499
|
-
"maxResponseSize": 50000
|
|
507
|
+
"includeSubdomains": false
|
|
500
508
|
}
|
|
501
509
|
}
|
|
502
510
|
\`\`\`
|
|
503
|
-
**Context Limiting:** Use maxResponseSize parameter to limit response size for MCP compatibility.
|
|
504
511
|
**Returns:** Extracted structured data as defined by your schema.
|
|
505
512
|
`,
|
|
506
513
|
parameters: z.object({
|
|
@@ -510,7 +517,6 @@ Extract structured information from web pages using LLM capabilities. Supports b
|
|
|
510
517
|
allowExternalLinks: z.boolean().optional(),
|
|
511
518
|
enableWebSearch: z.boolean().optional(),
|
|
512
519
|
includeSubdomains: z.boolean().optional(),
|
|
513
|
-
maxResponseSize: z.number().optional(),
|
|
514
520
|
}),
|
|
515
521
|
execute: async (args, { session, log }) => {
|
|
516
522
|
const client = getClient(session);
|
|
@@ -528,7 +534,7 @@ Extract structured information from web pages using LLM capabilities. Supports b
|
|
|
528
534
|
origin: ORIGIN,
|
|
529
535
|
});
|
|
530
536
|
const res = await client.extract(extractBody);
|
|
531
|
-
return asText(res
|
|
537
|
+
return asText(res);
|
|
532
538
|
},
|
|
533
539
|
});
|
|
534
540
|
const PORT = Number(process.env.PORT || 3000);
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "firecrawl-mcp",
|
|
3
|
-
"version": "3.
|
|
3
|
+
"version": "3.4.0",
|
|
4
4
|
"description": "MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, search, batch processing, structured data extraction, and LLM-powered content analysis.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|