firecrawl-mcp 3.3.6 → 3.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +51 -17
- package/dist/index.js +66 -34
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -12,7 +12,6 @@ A Model Context Protocol (MCP) server implementation that integrates with [Firec
|
|
|
12
12
|
|
|
13
13
|
> Big thanks to [@vrknetha](https://github.com/vrknetha), [@knacklabs](https://www.knacklabs.ai) for the initial implementation!
|
|
14
14
|
|
|
15
|
-
|
|
16
15
|
## Features
|
|
17
16
|
|
|
18
17
|
- Web scraping, crawling, and discovery
|
|
@@ -64,7 +63,7 @@ To configure Firecrawl MCP in Cursor **v0.48.6**
|
|
|
64
63
|
}
|
|
65
64
|
}
|
|
66
65
|
```
|
|
67
|
-
|
|
66
|
+
|
|
68
67
|
To configure Firecrawl MCP in Cursor **v0.45.6**
|
|
69
68
|
|
|
70
69
|
1. Open Cursor Settings
|
|
@@ -75,8 +74,6 @@ To configure Firecrawl MCP in Cursor **v0.45.6**
|
|
|
75
74
|
- Type: "command"
|
|
76
75
|
- Command: `env FIRECRAWL_API_KEY=your-api-key npx -y firecrawl-mcp`
|
|
77
76
|
|
|
78
|
-
|
|
79
|
-
|
|
80
77
|
> If you are using Windows and are running into issues, try `cmd /c "set FIRECRAWL_API_KEY=your-api-key && npx -y firecrawl-mcp"`
|
|
81
78
|
|
|
82
79
|
Replace `your-api-key` with your Firecrawl API key. If you don't have one yet, you can create an account and get it from https://www.firecrawl.dev/app/api-keys
|
|
@@ -101,15 +98,15 @@ Add this to your `./codeium/windsurf/model_config.json`:
|
|
|
101
98
|
}
|
|
102
99
|
```
|
|
103
100
|
|
|
104
|
-
### Running with
|
|
101
|
+
### Running with Streamable HTTP Local Mode
|
|
105
102
|
|
|
106
|
-
To run the server using
|
|
103
|
+
To run the server using Streamable HTTP locally instead of the default stdio transport:
|
|
107
104
|
|
|
108
105
|
```bash
|
|
109
|
-
env
|
|
106
|
+
env HTTP_STREAMABLE_SERVER=true FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp
|
|
110
107
|
```
|
|
111
108
|
|
|
112
|
-
Use the url: http://localhost:3000/
|
|
109
|
+
Use the url: http://localhost:3000/mcp
|
|
113
110
|
|
|
114
111
|
### Installing via Smithery (Legacy)
|
|
115
112
|
|
|
@@ -322,14 +319,14 @@ Use this guide to select the right tool for your task:
|
|
|
322
319
|
|
|
323
320
|
### Quick Reference Table
|
|
324
321
|
|
|
325
|
-
| Tool
|
|
326
|
-
|
|
327
|
-
| scrape
|
|
328
|
-
| batch_scrape
|
|
329
|
-
| map
|
|
330
|
-
| crawl
|
|
331
|
-
| search
|
|
332
|
-
| extract
|
|
322
|
+
| Tool | Best for | Returns |
|
|
323
|
+
| ------------ | ----------------------------------- | --------------- |
|
|
324
|
+
| scrape | Single page content | markdown/html |
|
|
325
|
+
| batch_scrape | Multiple known URLs | markdown/html[] |
|
|
326
|
+
| map | Discovering URLs on a site | URL[] |
|
|
327
|
+
| crawl | Multi-page extraction (with limits) | markdown/html[] |
|
|
328
|
+
| search | Web search for info | results[] |
|
|
329
|
+
| extract | Structured data from pages | JSON |
|
|
333
330
|
|
|
334
331
|
## Available Tools
|
|
335
332
|
|
|
@@ -338,20 +335,25 @@ Use this guide to select the right tool for your task:
|
|
|
338
335
|
Scrape content from a single URL with advanced options.
|
|
339
336
|
|
|
340
337
|
**Best for:**
|
|
338
|
+
|
|
341
339
|
- Single page content extraction, when you know exactly which page contains the information.
|
|
342
340
|
|
|
343
341
|
**Not recommended for:**
|
|
342
|
+
|
|
344
343
|
- Extracting content from multiple pages (use batch_scrape for known URLs, or map + batch_scrape to discover URLs first, or crawl for full page content)
|
|
345
344
|
- When you're unsure which page contains the information (use search)
|
|
346
345
|
- When you need structured data (use extract)
|
|
347
346
|
|
|
348
347
|
**Common mistakes:**
|
|
348
|
+
|
|
349
349
|
- Using scrape for a list of URLs (use batch_scrape instead).
|
|
350
350
|
|
|
351
351
|
**Prompt Example:**
|
|
352
|
+
|
|
352
353
|
> "Get the content of the page at https://example.com."
|
|
353
354
|
|
|
354
355
|
**Usage Example:**
|
|
356
|
+
|
|
355
357
|
```json
|
|
356
358
|
{
|
|
357
359
|
"name": "firecrawl_scrape",
|
|
@@ -370,6 +372,7 @@ Scrape content from a single URL with advanced options.
|
|
|
370
372
|
```
|
|
371
373
|
|
|
372
374
|
**Returns:**
|
|
375
|
+
|
|
373
376
|
- Markdown, HTML, or other formats as specified.
|
|
374
377
|
|
|
375
378
|
### 2. Batch Scrape Tool (`firecrawl_batch_scrape`)
|
|
@@ -377,19 +380,24 @@ Scrape content from a single URL with advanced options.
|
|
|
377
380
|
Scrape multiple URLs efficiently with built-in rate limiting and parallel processing.
|
|
378
381
|
|
|
379
382
|
**Best for:**
|
|
383
|
+
|
|
380
384
|
- Retrieving content from multiple pages, when you know exactly which pages to scrape.
|
|
381
385
|
|
|
382
386
|
**Not recommended for:**
|
|
387
|
+
|
|
383
388
|
- Discovering URLs (use map first if you don't know the URLs)
|
|
384
389
|
- Scraping a single page (use scrape)
|
|
385
390
|
|
|
386
391
|
**Common mistakes:**
|
|
392
|
+
|
|
387
393
|
- Using batch_scrape with too many URLs at once (may hit rate limits or token overflow)
|
|
388
394
|
|
|
389
395
|
**Prompt Example:**
|
|
396
|
+
|
|
390
397
|
> "Get the content of these three blog posts: [url1, url2, url3]."
|
|
391
398
|
|
|
392
399
|
**Usage Example:**
|
|
400
|
+
|
|
393
401
|
```json
|
|
394
402
|
{
|
|
395
403
|
"name": "firecrawl_batch_scrape",
|
|
@@ -404,6 +412,7 @@ Scrape multiple URLs efficiently with built-in rate limiting and parallel proces
|
|
|
404
412
|
```
|
|
405
413
|
|
|
406
414
|
**Returns:**
|
|
415
|
+
|
|
407
416
|
- Response includes operation ID for status checking:
|
|
408
417
|
|
|
409
418
|
```json
|
|
@@ -436,20 +445,25 @@ Check the status of a batch operation.
|
|
|
436
445
|
Map a website to discover all indexed URLs on the site.
|
|
437
446
|
|
|
438
447
|
**Best for:**
|
|
448
|
+
|
|
439
449
|
- Discovering URLs on a website before deciding what to scrape
|
|
440
450
|
- Finding specific sections of a website
|
|
441
451
|
|
|
442
452
|
**Not recommended for:**
|
|
453
|
+
|
|
443
454
|
- When you already know which specific URL you need (use scrape or batch_scrape)
|
|
444
455
|
- When you need the content of the pages (use scrape after mapping)
|
|
445
456
|
|
|
446
457
|
**Common mistakes:**
|
|
458
|
+
|
|
447
459
|
- Using crawl to discover URLs instead of map
|
|
448
460
|
|
|
449
461
|
**Prompt Example:**
|
|
462
|
+
|
|
450
463
|
> "List all URLs on example.com."
|
|
451
464
|
|
|
452
465
|
**Usage Example:**
|
|
466
|
+
|
|
453
467
|
```json
|
|
454
468
|
{
|
|
455
469
|
"name": "firecrawl_map",
|
|
@@ -460,6 +474,7 @@ Map a website to discover all indexed URLs on the site.
|
|
|
460
474
|
```
|
|
461
475
|
|
|
462
476
|
**Returns:**
|
|
477
|
+
|
|
463
478
|
- Array of URLs found on the site
|
|
464
479
|
|
|
465
480
|
### 5. Search Tool (`firecrawl_search`)
|
|
@@ -467,17 +482,21 @@ Map a website to discover all indexed URLs on the site.
|
|
|
467
482
|
Search the web and optionally extract content from search results.
|
|
468
483
|
|
|
469
484
|
**Best for:**
|
|
485
|
+
|
|
470
486
|
- Finding specific information across multiple websites, when you don't know which website has the information.
|
|
471
487
|
- When you need the most relevant content for a query
|
|
472
488
|
|
|
473
489
|
**Not recommended for:**
|
|
490
|
+
|
|
474
491
|
- When you already know which website to scrape (use scrape)
|
|
475
492
|
- When you need comprehensive coverage of a single website (use map or crawl)
|
|
476
493
|
|
|
477
494
|
**Common mistakes:**
|
|
495
|
+
|
|
478
496
|
- Using crawl or map for open-ended questions (use search instead)
|
|
479
497
|
|
|
480
498
|
**Usage Example:**
|
|
499
|
+
|
|
481
500
|
```json
|
|
482
501
|
{
|
|
483
502
|
"name": "firecrawl_search",
|
|
@@ -495,9 +514,11 @@ Search the web and optionally extract content from search results.
|
|
|
495
514
|
```
|
|
496
515
|
|
|
497
516
|
**Returns:**
|
|
517
|
+
|
|
498
518
|
- Array of search results (with optional scraped content)
|
|
499
519
|
|
|
500
520
|
**Prompt Example:**
|
|
521
|
+
|
|
501
522
|
> "Find the latest research papers on AI published in 2023."
|
|
502
523
|
|
|
503
524
|
### 6. Crawl Tool (`firecrawl_crawl`)
|
|
@@ -505,9 +526,11 @@ Search the web and optionally extract content from search results.
|
|
|
505
526
|
Starts an asynchronous crawl job on a website and extract content from all pages.
|
|
506
527
|
|
|
507
528
|
**Best for:**
|
|
529
|
+
|
|
508
530
|
- Extracting content from multiple related pages, when you need comprehensive coverage.
|
|
509
531
|
|
|
510
532
|
**Not recommended for:**
|
|
533
|
+
|
|
511
534
|
- Extracting content from a single page (use scrape)
|
|
512
535
|
- When token limits are a concern (use map + batch_scrape)
|
|
513
536
|
- When you need fast results (crawling can be slow)
|
|
@@ -515,13 +538,16 @@ Starts an asynchronous crawl job on a website and extract content from all pages
|
|
|
515
538
|
**Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control.
|
|
516
539
|
|
|
517
540
|
**Common mistakes:**
|
|
541
|
+
|
|
518
542
|
- Setting limit or maxDepth too high (causes token overflow)
|
|
519
543
|
- Using crawl for a single page (use scrape instead)
|
|
520
544
|
|
|
521
545
|
**Prompt Example:**
|
|
546
|
+
|
|
522
547
|
> "Get all blog posts from the first two levels of example.com/blog."
|
|
523
548
|
|
|
524
549
|
**Usage Example:**
|
|
550
|
+
|
|
525
551
|
```json
|
|
526
552
|
{
|
|
527
553
|
"name": "firecrawl_crawl",
|
|
@@ -536,6 +562,7 @@ Starts an asynchronous crawl job on a website and extract content from all pages
|
|
|
536
562
|
```
|
|
537
563
|
|
|
538
564
|
**Returns:**
|
|
565
|
+
|
|
539
566
|
- Response includes operation ID for status checking:
|
|
540
567
|
|
|
541
568
|
```json
|
|
@@ -564,20 +591,24 @@ Check the status of a crawl job.
|
|
|
564
591
|
```
|
|
565
592
|
|
|
566
593
|
**Returns:**
|
|
594
|
+
|
|
567
595
|
- Response includes the status of the crawl job:
|
|
568
|
-
|
|
596
|
+
|
|
569
597
|
### 8. Extract Tool (`firecrawl_extract`)
|
|
570
598
|
|
|
571
599
|
Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.
|
|
572
600
|
|
|
573
601
|
**Best for:**
|
|
602
|
+
|
|
574
603
|
- Extracting specific structured data like prices, names, details.
|
|
575
604
|
|
|
576
605
|
**Not recommended for:**
|
|
606
|
+
|
|
577
607
|
- When you need the full content of a page (use scrape)
|
|
578
608
|
- When you're not looking for specific structured data
|
|
579
609
|
|
|
580
610
|
**Arguments:**
|
|
611
|
+
|
|
581
612
|
- `urls`: Array of URLs to extract information from
|
|
582
613
|
- `prompt`: Custom prompt for the LLM extraction
|
|
583
614
|
- `systemPrompt`: System prompt to guide the LLM
|
|
@@ -588,9 +619,11 @@ Extract structured information from web pages using LLM capabilities. Supports b
|
|
|
588
619
|
|
|
589
620
|
When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses Firecrawl's managed LLM service.
|
|
590
621
|
**Prompt Example:**
|
|
622
|
+
|
|
591
623
|
> "Extract the product name, price, and description from these product pages."
|
|
592
624
|
|
|
593
625
|
**Usage Example:**
|
|
626
|
+
|
|
594
627
|
```json
|
|
595
628
|
{
|
|
596
629
|
"name": "firecrawl_extract",
|
|
@@ -615,6 +648,7 @@ When using a self-hosted instance, the extraction will use your configured LLM.
|
|
|
615
648
|
```
|
|
616
649
|
|
|
617
650
|
**Returns:**
|
|
651
|
+
|
|
618
652
|
- Extracted structured data as defined by your schema
|
|
619
653
|
|
|
620
654
|
```json
|
package/dist/index.js
CHANGED
|
@@ -36,9 +36,9 @@ function removeEmptyTopLevel(obj) {
|
|
|
36
36
|
return out;
|
|
37
37
|
}
|
|
38
38
|
class ConsoleLogger {
|
|
39
|
-
shouldLog =
|
|
39
|
+
shouldLog = process.env.CLOUD_SERVICE === 'true' ||
|
|
40
40
|
process.env.SSE_LOCAL === 'true' ||
|
|
41
|
-
process.env.HTTP_STREAMABLE_SERVER === 'true'
|
|
41
|
+
process.env.HTTP_STREAMABLE_SERVER === 'true';
|
|
42
42
|
debug(...args) {
|
|
43
43
|
if (this.shouldLog) {
|
|
44
44
|
console.debug('[DEBUG]', new Date().toISOString(), ...args);
|
|
@@ -119,7 +119,8 @@ function getClient(session) {
|
|
|
119
119
|
return createClient(session.firecrawlApiKey);
|
|
120
120
|
}
|
|
121
121
|
// For self-hosted instances, API key is optional if FIRECRAWL_API_URL is provided
|
|
122
|
-
if (!process.env.FIRECRAWL_API_URL &&
|
|
122
|
+
if (!process.env.FIRECRAWL_API_URL &&
|
|
123
|
+
(!session || !session.firecrawlApiKey)) {
|
|
123
124
|
throw new Error('Unauthorized: API key is required when not using a self-hosted instance');
|
|
124
125
|
}
|
|
125
126
|
return createClient(session?.firecrawlApiKey);
|
|
@@ -131,7 +132,13 @@ function asText(data) {
|
|
|
131
132
|
// Centralized scrape params (used by scrape, and referenced in search/crawl scrapeOptions)
|
|
132
133
|
// Define safe action types
|
|
133
134
|
const safeActionTypes = ['wait', 'screenshot', 'scroll', 'scrape'];
|
|
134
|
-
const otherActions = [
|
|
135
|
+
const otherActions = [
|
|
136
|
+
'click',
|
|
137
|
+
'write',
|
|
138
|
+
'press',
|
|
139
|
+
'executeJavascript',
|
|
140
|
+
'generatePDF',
|
|
141
|
+
];
|
|
135
142
|
const allActionTypes = [...safeActionTypes, ...otherActions];
|
|
136
143
|
// Use appropriate action types based on safe mode
|
|
137
144
|
const allowedActionTypes = SAFE_MODE ? safeActionTypes : allActionTypes;
|
|
@@ -163,24 +170,35 @@ const scrapeParamsSchema = z.object({
|
|
|
163
170
|
}),
|
|
164
171
|
]))
|
|
165
172
|
.optional(),
|
|
173
|
+
parsers: z
|
|
174
|
+
.array(z.union([
|
|
175
|
+
z.enum(['pdf']),
|
|
176
|
+
z.object({
|
|
177
|
+
type: z.enum(['pdf']),
|
|
178
|
+
maxPages: z.number().int().min(1).max(10000).optional(),
|
|
179
|
+
}),
|
|
180
|
+
]))
|
|
181
|
+
.optional(),
|
|
166
182
|
onlyMainContent: z.boolean().optional(),
|
|
167
183
|
includeTags: z.array(z.string()).optional(),
|
|
168
184
|
excludeTags: z.array(z.string()).optional(),
|
|
169
185
|
waitFor: z.number().optional(),
|
|
170
|
-
...(SAFE_MODE
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
186
|
+
...(SAFE_MODE
|
|
187
|
+
? {}
|
|
188
|
+
: {
|
|
189
|
+
actions: z
|
|
190
|
+
.array(z.object({
|
|
191
|
+
type: z.enum(allowedActionTypes),
|
|
192
|
+
selector: z.string().optional(),
|
|
193
|
+
milliseconds: z.number().optional(),
|
|
194
|
+
text: z.string().optional(),
|
|
195
|
+
key: z.string().optional(),
|
|
196
|
+
direction: z.enum(['up', 'down']).optional(),
|
|
197
|
+
script: z.string().optional(),
|
|
198
|
+
fullPage: z.boolean().optional(),
|
|
199
|
+
}))
|
|
200
|
+
.optional(),
|
|
201
|
+
}),
|
|
184
202
|
mobile: z.boolean().optional(),
|
|
185
203
|
skipTlsVerification: z.boolean().optional(),
|
|
186
204
|
removeBase64Images: z.boolean().optional(),
|
|
@@ -216,7 +234,9 @@ This is the most powerful, fastest and most reliable scraper tool, if available
|
|
|
216
234
|
\`\`\`
|
|
217
235
|
**Performance:** Add maxAge parameter for 500% faster scrapes using cached data.
|
|
218
236
|
**Returns:** Markdown, HTML, or other formats as specified.
|
|
219
|
-
${SAFE_MODE
|
|
237
|
+
${SAFE_MODE
|
|
238
|
+
? '**Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.'
|
|
239
|
+
: ''}
|
|
220
240
|
`,
|
|
221
241
|
parameters: scrapeParamsSchema,
|
|
222
242
|
execute: async (args, { session, log }) => {
|
|
@@ -224,7 +244,10 @@ ${SAFE_MODE ? '**Safe Mode:** Read-only content extraction. Interactive actions
|
|
|
224
244
|
const client = getClient(session);
|
|
225
245
|
const cleaned = removeEmptyTopLevel(options);
|
|
226
246
|
log.info('Scraping URL', { url: String(url) });
|
|
227
|
-
const res = await client.scrape(String(url), {
|
|
247
|
+
const res = await client.scrape(String(url), {
|
|
248
|
+
...cleaned,
|
|
249
|
+
origin: ORIGIN,
|
|
250
|
+
});
|
|
228
251
|
return asText(res);
|
|
229
252
|
},
|
|
230
253
|
});
|
|
@@ -261,7 +284,10 @@ Map a website to discover all indexed URLs on the site.
|
|
|
261
284
|
const client = getClient(session);
|
|
262
285
|
const cleaned = removeEmptyTopLevel(options);
|
|
263
286
|
log.info('Mapping URL', { url: String(url) });
|
|
264
|
-
const res = await client.map(String(url), {
|
|
287
|
+
const res = await client.map(String(url), {
|
|
288
|
+
...cleaned,
|
|
289
|
+
origin: ORIGIN,
|
|
290
|
+
});
|
|
265
291
|
return asText(res);
|
|
266
292
|
},
|
|
267
293
|
});
|
|
@@ -290,7 +316,9 @@ The query also supports search operators, that you can use if needed to refine t
|
|
|
290
316
|
**Prompt Example:** "Find the latest research papers on AI published in 2023."
|
|
291
317
|
**Sources:** web, images, news, default to web unless needed images or news.
|
|
292
318
|
**Scrape Options:** Only use scrapeOptions when you think it is absolutely necessary. When you do so default to a lower limit to avoid timeouts, 5 or lower.
|
|
293
|
-
**
|
|
319
|
+
**Optimal Workflow:** Search first using firecrawl_search without formats, then after fetching the results, use the scrape tool to get the content of the relevantpage(s) that you want to scrape
|
|
320
|
+
|
|
321
|
+
**Usage Example without formats (Preferred):**
|
|
294
322
|
\`\`\`json
|
|
295
323
|
{
|
|
296
324
|
"name": "firecrawl_search",
|
|
@@ -374,7 +402,9 @@ server.addTool({
|
|
|
374
402
|
}
|
|
375
403
|
\`\`\`
|
|
376
404
|
**Returns:** Operation ID for status checking; use firecrawl_check_crawl_status to check progress.
|
|
377
|
-
${SAFE_MODE
|
|
405
|
+
${SAFE_MODE
|
|
406
|
+
? '**Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.'
|
|
407
|
+
: ''}
|
|
378
408
|
`,
|
|
379
409
|
parameters: z.object({
|
|
380
410
|
url: z.string(),
|
|
@@ -389,17 +419,19 @@ server.addTool({
|
|
|
389
419
|
crawlEntireDomain: z.boolean().optional(),
|
|
390
420
|
delay: z.number().optional(),
|
|
391
421
|
maxConcurrency: z.number().optional(),
|
|
392
|
-
...(SAFE_MODE
|
|
393
|
-
|
|
394
|
-
|
|
395
|
-
z
|
|
396
|
-
|
|
397
|
-
|
|
398
|
-
|
|
399
|
-
|
|
400
|
-
|
|
401
|
-
|
|
402
|
-
|
|
422
|
+
...(SAFE_MODE
|
|
423
|
+
? {}
|
|
424
|
+
: {
|
|
425
|
+
webhook: z
|
|
426
|
+
.union([
|
|
427
|
+
z.string(),
|
|
428
|
+
z.object({
|
|
429
|
+
url: z.string(),
|
|
430
|
+
headers: z.record(z.string(), z.string()).optional(),
|
|
431
|
+
}),
|
|
432
|
+
])
|
|
433
|
+
.optional(),
|
|
434
|
+
}),
|
|
403
435
|
deduplicateSimilarURLs: z.boolean().optional(),
|
|
404
436
|
ignoreQueryParameters: z.boolean().optional(),
|
|
405
437
|
scrapeOptions: scrapeParamsSchema.omit({ url: true }).partial().optional(),
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "firecrawl-mcp",
|
|
3
|
-
"version": "3.
|
|
3
|
+
"version": "3.4.0",
|
|
4
4
|
"description": "MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, search, batch processing, structured data extraction, and LLM-powered content analysis.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|