firecrawl-mcp 3.3.6 → 3.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +51 -17
  2. package/dist/index.js +66 -34
  3. package/package.json +2 -1
package/README.md CHANGED
@@ -12,7 +12,6 @@ A Model Context Protocol (MCP) server implementation that integrates with [Firec
12
12
 
13
13
  > Big thanks to [@vrknetha](https://github.com/vrknetha), [@knacklabs](https://www.knacklabs.ai) for the initial implementation!
14
14
 
15
-
16
15
  ## Features
17
16
 
18
17
  - Web scraping, crawling, and discovery
@@ -64,7 +63,7 @@ To configure Firecrawl MCP in Cursor **v0.48.6**
64
63
  }
65
64
  }
66
65
  ```
67
-
66
+
68
67
  To configure Firecrawl MCP in Cursor **v0.45.6**
69
68
 
70
69
  1. Open Cursor Settings
@@ -75,8 +74,6 @@ To configure Firecrawl MCP in Cursor **v0.45.6**
75
74
  - Type: "command"
76
75
  - Command: `env FIRECRAWL_API_KEY=your-api-key npx -y firecrawl-mcp`
77
76
 
78
-
79
-
80
77
  > If you are using Windows and are running into issues, try `cmd /c "set FIRECRAWL_API_KEY=your-api-key && npx -y firecrawl-mcp"`
81
78
 
82
79
  Replace `your-api-key` with your Firecrawl API key. If you don't have one yet, you can create an account and get it from https://www.firecrawl.dev/app/api-keys
@@ -101,15 +98,15 @@ Add this to your `./codeium/windsurf/model_config.json`:
101
98
  }
102
99
  ```
103
100
 
104
- ### Running with SSE Local Mode
101
+ ### Running with Streamable HTTP Local Mode
105
102
 
106
- To run the server using Server-Sent Events (SSE) locally instead of the default stdio transport:
103
+ To run the server using Streamable HTTP locally instead of the default stdio transport:
107
104
 
108
105
  ```bash
109
- env SSE_LOCAL=true FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp
106
+ env HTTP_STREAMABLE_SERVER=true FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp
110
107
  ```
111
108
 
112
- Use the url: http://localhost:3000/sse
109
+ Use the url: http://localhost:3000/mcp
113
110
 
114
111
  ### Installing via Smithery (Legacy)
115
112
 
@@ -322,14 +319,14 @@ Use this guide to select the right tool for your task:
322
319
 
323
320
  ### Quick Reference Table
324
321
 
325
- | Tool | Best for | Returns |
326
- |---------------------|------------------------------------------|-----------------|
327
- | scrape | Single page content | markdown/html |
328
- | batch_scrape | Multiple known URLs | markdown/html[] |
329
- | map | Discovering URLs on a site | URL[] |
330
- | crawl | Multi-page extraction (with limits) | markdown/html[] |
331
- | search | Web search for info | results[] |
332
- | extract | Structured data from pages | JSON |
322
+ | Tool | Best for | Returns |
323
+ | ------------ | ----------------------------------- | --------------- |
324
+ | scrape | Single page content | markdown/html |
325
+ | batch_scrape | Multiple known URLs | markdown/html[] |
326
+ | map | Discovering URLs on a site | URL[] |
327
+ | crawl | Multi-page extraction (with limits) | markdown/html[] |
328
+ | search | Web search for info | results[] |
329
+ | extract | Structured data from pages | JSON |
333
330
 
334
331
  ## Available Tools
335
332
 
@@ -338,20 +335,25 @@ Use this guide to select the right tool for your task:
338
335
  Scrape content from a single URL with advanced options.
339
336
 
340
337
  **Best for:**
338
+
341
339
  - Single page content extraction, when you know exactly which page contains the information.
342
340
 
343
341
  **Not recommended for:**
342
+
344
343
  - Extracting content from multiple pages (use batch_scrape for known URLs, or map + batch_scrape to discover URLs first, or crawl for full page content)
345
344
  - When you're unsure which page contains the information (use search)
346
345
  - When you need structured data (use extract)
347
346
 
348
347
  **Common mistakes:**
348
+
349
349
  - Using scrape for a list of URLs (use batch_scrape instead).
350
350
 
351
351
  **Prompt Example:**
352
+
352
353
  > "Get the content of the page at https://example.com."
353
354
 
354
355
  **Usage Example:**
356
+
355
357
  ```json
356
358
  {
357
359
  "name": "firecrawl_scrape",
@@ -370,6 +372,7 @@ Scrape content from a single URL with advanced options.
370
372
  ```
371
373
 
372
374
  **Returns:**
375
+
373
376
  - Markdown, HTML, or other formats as specified.
374
377
 
375
378
  ### 2. Batch Scrape Tool (`firecrawl_batch_scrape`)
@@ -377,19 +380,24 @@ Scrape content from a single URL with advanced options.
377
380
  Scrape multiple URLs efficiently with built-in rate limiting and parallel processing.
378
381
 
379
382
  **Best for:**
383
+
380
384
  - Retrieving content from multiple pages, when you know exactly which pages to scrape.
381
385
 
382
386
  **Not recommended for:**
387
+
383
388
  - Discovering URLs (use map first if you don't know the URLs)
384
389
  - Scraping a single page (use scrape)
385
390
 
386
391
  **Common mistakes:**
392
+
387
393
  - Using batch_scrape with too many URLs at once (may hit rate limits or token overflow)
388
394
 
389
395
  **Prompt Example:**
396
+
390
397
  > "Get the content of these three blog posts: [url1, url2, url3]."
391
398
 
392
399
  **Usage Example:**
400
+
393
401
  ```json
394
402
  {
395
403
  "name": "firecrawl_batch_scrape",
@@ -404,6 +412,7 @@ Scrape multiple URLs efficiently with built-in rate limiting and parallel proces
404
412
  ```
405
413
 
406
414
  **Returns:**
415
+
407
416
  - Response includes operation ID for status checking:
408
417
 
409
418
  ```json
@@ -436,20 +445,25 @@ Check the status of a batch operation.
436
445
  Map a website to discover all indexed URLs on the site.
437
446
 
438
447
  **Best for:**
448
+
439
449
  - Discovering URLs on a website before deciding what to scrape
440
450
  - Finding specific sections of a website
441
451
 
442
452
  **Not recommended for:**
453
+
443
454
  - When you already know which specific URL you need (use scrape or batch_scrape)
444
455
  - When you need the content of the pages (use scrape after mapping)
445
456
 
446
457
  **Common mistakes:**
458
+
447
459
  - Using crawl to discover URLs instead of map
448
460
 
449
461
  **Prompt Example:**
462
+
450
463
  > "List all URLs on example.com."
451
464
 
452
465
  **Usage Example:**
466
+
453
467
  ```json
454
468
  {
455
469
  "name": "firecrawl_map",
@@ -460,6 +474,7 @@ Map a website to discover all indexed URLs on the site.
460
474
  ```
461
475
 
462
476
  **Returns:**
477
+
463
478
  - Array of URLs found on the site
464
479
 
465
480
  ### 5. Search Tool (`firecrawl_search`)
@@ -467,17 +482,21 @@ Map a website to discover all indexed URLs on the site.
467
482
  Search the web and optionally extract content from search results.
468
483
 
469
484
  **Best for:**
485
+
470
486
  - Finding specific information across multiple websites, when you don't know which website has the information.
471
487
  - When you need the most relevant content for a query
472
488
 
473
489
  **Not recommended for:**
490
+
474
491
  - When you already know which website to scrape (use scrape)
475
492
  - When you need comprehensive coverage of a single website (use map or crawl)
476
493
 
477
494
  **Common mistakes:**
495
+
478
496
  - Using crawl or map for open-ended questions (use search instead)
479
497
 
480
498
  **Usage Example:**
499
+
481
500
  ```json
482
501
  {
483
502
  "name": "firecrawl_search",
@@ -495,9 +514,11 @@ Search the web and optionally extract content from search results.
495
514
  ```
496
515
 
497
516
  **Returns:**
517
+
498
518
  - Array of search results (with optional scraped content)
499
519
 
500
520
  **Prompt Example:**
521
+
501
522
  > "Find the latest research papers on AI published in 2023."
502
523
 
503
524
  ### 6. Crawl Tool (`firecrawl_crawl`)
@@ -505,9 +526,11 @@ Search the web and optionally extract content from search results.
505
526
  Starts an asynchronous crawl job on a website and extract content from all pages.
506
527
 
507
528
  **Best for:**
529
+
508
530
  - Extracting content from multiple related pages, when you need comprehensive coverage.
509
531
 
510
532
  **Not recommended for:**
533
+
511
534
  - Extracting content from a single page (use scrape)
512
535
  - When token limits are a concern (use map + batch_scrape)
513
536
  - When you need fast results (crawling can be slow)
@@ -515,13 +538,16 @@ Starts an asynchronous crawl job on a website and extract content from all pages
515
538
  **Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control.
516
539
 
517
540
  **Common mistakes:**
541
+
518
542
  - Setting limit or maxDepth too high (causes token overflow)
519
543
  - Using crawl for a single page (use scrape instead)
520
544
 
521
545
  **Prompt Example:**
546
+
522
547
  > "Get all blog posts from the first two levels of example.com/blog."
523
548
 
524
549
  **Usage Example:**
550
+
525
551
  ```json
526
552
  {
527
553
  "name": "firecrawl_crawl",
@@ -536,6 +562,7 @@ Starts an asynchronous crawl job on a website and extract content from all pages
536
562
  ```
537
563
 
538
564
  **Returns:**
565
+
539
566
  - Response includes operation ID for status checking:
540
567
 
541
568
  ```json
@@ -564,20 +591,24 @@ Check the status of a crawl job.
564
591
  ```
565
592
 
566
593
  **Returns:**
594
+
567
595
  - Response includes the status of the crawl job:
568
-
596
+
569
597
  ### 8. Extract Tool (`firecrawl_extract`)
570
598
 
571
599
  Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.
572
600
 
573
601
  **Best for:**
602
+
574
603
  - Extracting specific structured data like prices, names, details.
575
604
 
576
605
  **Not recommended for:**
606
+
577
607
  - When you need the full content of a page (use scrape)
578
608
  - When you're not looking for specific structured data
579
609
 
580
610
  **Arguments:**
611
+
581
612
  - `urls`: Array of URLs to extract information from
582
613
  - `prompt`: Custom prompt for the LLM extraction
583
614
  - `systemPrompt`: System prompt to guide the LLM
@@ -588,9 +619,11 @@ Extract structured information from web pages using LLM capabilities. Supports b
588
619
 
589
620
  When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses Firecrawl's managed LLM service.
590
621
  **Prompt Example:**
622
+
591
623
  > "Extract the product name, price, and description from these product pages."
592
624
 
593
625
  **Usage Example:**
626
+
594
627
  ```json
595
628
  {
596
629
  "name": "firecrawl_extract",
@@ -615,6 +648,7 @@ When using a self-hosted instance, the extraction will use your configured LLM.
615
648
  ```
616
649
 
617
650
  **Returns:**
651
+
618
652
  - Extracted structured data as defined by your schema
619
653
 
620
654
  ```json
package/dist/index.js CHANGED
@@ -36,9 +36,9 @@ function removeEmptyTopLevel(obj) {
36
36
  return out;
37
37
  }
38
38
  class ConsoleLogger {
39
- shouldLog = (process.env.CLOUD_SERVICE === 'true' ||
39
+ shouldLog = process.env.CLOUD_SERVICE === 'true' ||
40
40
  process.env.SSE_LOCAL === 'true' ||
41
- process.env.HTTP_STREAMABLE_SERVER === 'true');
41
+ process.env.HTTP_STREAMABLE_SERVER === 'true';
42
42
  debug(...args) {
43
43
  if (this.shouldLog) {
44
44
  console.debug('[DEBUG]', new Date().toISOString(), ...args);
@@ -119,7 +119,8 @@ function getClient(session) {
119
119
  return createClient(session.firecrawlApiKey);
120
120
  }
121
121
  // For self-hosted instances, API key is optional if FIRECRAWL_API_URL is provided
122
- if (!process.env.FIRECRAWL_API_URL && (!session || !session.firecrawlApiKey)) {
122
+ if (!process.env.FIRECRAWL_API_URL &&
123
+ (!session || !session.firecrawlApiKey)) {
123
124
  throw new Error('Unauthorized: API key is required when not using a self-hosted instance');
124
125
  }
125
126
  return createClient(session?.firecrawlApiKey);
@@ -131,7 +132,13 @@ function asText(data) {
131
132
  // Centralized scrape params (used by scrape, and referenced in search/crawl scrapeOptions)
132
133
  // Define safe action types
133
134
  const safeActionTypes = ['wait', 'screenshot', 'scroll', 'scrape'];
134
- const otherActions = ['click', 'write', 'press', 'executeJavascript', 'generatePDF'];
135
+ const otherActions = [
136
+ 'click',
137
+ 'write',
138
+ 'press',
139
+ 'executeJavascript',
140
+ 'generatePDF',
141
+ ];
135
142
  const allActionTypes = [...safeActionTypes, ...otherActions];
136
143
  // Use appropriate action types based on safe mode
137
144
  const allowedActionTypes = SAFE_MODE ? safeActionTypes : allActionTypes;
@@ -163,24 +170,35 @@ const scrapeParamsSchema = z.object({
163
170
  }),
164
171
  ]))
165
172
  .optional(),
173
+ parsers: z
174
+ .array(z.union([
175
+ z.enum(['pdf']),
176
+ z.object({
177
+ type: z.enum(['pdf']),
178
+ maxPages: z.number().int().min(1).max(10000).optional(),
179
+ }),
180
+ ]))
181
+ .optional(),
166
182
  onlyMainContent: z.boolean().optional(),
167
183
  includeTags: z.array(z.string()).optional(),
168
184
  excludeTags: z.array(z.string()).optional(),
169
185
  waitFor: z.number().optional(),
170
- ...(SAFE_MODE ? {} : {
171
- actions: z
172
- .array(z.object({
173
- type: z.enum(allowedActionTypes),
174
- selector: z.string().optional(),
175
- milliseconds: z.number().optional(),
176
- text: z.string().optional(),
177
- key: z.string().optional(),
178
- direction: z.enum(['up', 'down']).optional(),
179
- script: z.string().optional(),
180
- fullPage: z.boolean().optional(),
181
- }))
182
- .optional(),
183
- }),
186
+ ...(SAFE_MODE
187
+ ? {}
188
+ : {
189
+ actions: z
190
+ .array(z.object({
191
+ type: z.enum(allowedActionTypes),
192
+ selector: z.string().optional(),
193
+ milliseconds: z.number().optional(),
194
+ text: z.string().optional(),
195
+ key: z.string().optional(),
196
+ direction: z.enum(['up', 'down']).optional(),
197
+ script: z.string().optional(),
198
+ fullPage: z.boolean().optional(),
199
+ }))
200
+ .optional(),
201
+ }),
184
202
  mobile: z.boolean().optional(),
185
203
  skipTlsVerification: z.boolean().optional(),
186
204
  removeBase64Images: z.boolean().optional(),
@@ -216,7 +234,9 @@ This is the most powerful, fastest and most reliable scraper tool, if available
216
234
  \`\`\`
217
235
  **Performance:** Add maxAge parameter for 500% faster scrapes using cached data.
218
236
  **Returns:** Markdown, HTML, or other formats as specified.
219
- ${SAFE_MODE ? '**Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.' : ''}
237
+ ${SAFE_MODE
238
+ ? '**Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.'
239
+ : ''}
220
240
  `,
221
241
  parameters: scrapeParamsSchema,
222
242
  execute: async (args, { session, log }) => {
@@ -224,7 +244,10 @@ ${SAFE_MODE ? '**Safe Mode:** Read-only content extraction. Interactive actions
224
244
  const client = getClient(session);
225
245
  const cleaned = removeEmptyTopLevel(options);
226
246
  log.info('Scraping URL', { url: String(url) });
227
- const res = await client.scrape(String(url), { ...cleaned, origin: ORIGIN });
247
+ const res = await client.scrape(String(url), {
248
+ ...cleaned,
249
+ origin: ORIGIN,
250
+ });
228
251
  return asText(res);
229
252
  },
230
253
  });
@@ -261,7 +284,10 @@ Map a website to discover all indexed URLs on the site.
261
284
  const client = getClient(session);
262
285
  const cleaned = removeEmptyTopLevel(options);
263
286
  log.info('Mapping URL', { url: String(url) });
264
- const res = await client.map(String(url), { ...cleaned, origin: ORIGIN });
287
+ const res = await client.map(String(url), {
288
+ ...cleaned,
289
+ origin: ORIGIN,
290
+ });
265
291
  return asText(res);
266
292
  },
267
293
  });
@@ -290,7 +316,9 @@ The query also supports search operators, that you can use if needed to refine t
290
316
  **Prompt Example:** "Find the latest research papers on AI published in 2023."
291
317
  **Sources:** web, images, news, default to web unless needed images or news.
292
318
  **Scrape Options:** Only use scrapeOptions when you think it is absolutely necessary. When you do so default to a lower limit to avoid timeouts, 5 or lower.
293
- **Usage Example without formats:**
319
+ **Optimal Workflow:** Search first using firecrawl_search without formats, then after fetching the results, use the scrape tool to get the content of the relevantpage(s) that you want to scrape
320
+
321
+ **Usage Example without formats (Preferred):**
294
322
  \`\`\`json
295
323
  {
296
324
  "name": "firecrawl_search",
@@ -374,7 +402,9 @@ server.addTool({
374
402
  }
375
403
  \`\`\`
376
404
  **Returns:** Operation ID for status checking; use firecrawl_check_crawl_status to check progress.
377
- ${SAFE_MODE ? '**Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.' : ''}
405
+ ${SAFE_MODE
406
+ ? '**Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.'
407
+ : ''}
378
408
  `,
379
409
  parameters: z.object({
380
410
  url: z.string(),
@@ -389,17 +419,19 @@ server.addTool({
389
419
  crawlEntireDomain: z.boolean().optional(),
390
420
  delay: z.number().optional(),
391
421
  maxConcurrency: z.number().optional(),
392
- ...(SAFE_MODE ? {} : {
393
- webhook: z
394
- .union([
395
- z.string(),
396
- z.object({
397
- url: z.string(),
398
- headers: z.record(z.string(), z.string()).optional(),
399
- }),
400
- ])
401
- .optional(),
402
- }),
422
+ ...(SAFE_MODE
423
+ ? {}
424
+ : {
425
+ webhook: z
426
+ .union([
427
+ z.string(),
428
+ z.object({
429
+ url: z.string(),
430
+ headers: z.record(z.string(), z.string()).optional(),
431
+ }),
432
+ ])
433
+ .optional(),
434
+ }),
403
435
  deduplicateSimilarURLs: z.boolean().optional(),
404
436
  ignoreQueryParameters: z.boolean().optional(),
405
437
  scrapeOptions: scrapeParamsSchema.omit({ url: true }).partial().optional(),
package/package.json CHANGED
@@ -1,8 +1,9 @@
1
1
  {
2
2
  "name": "firecrawl-mcp",
3
- "version": "3.3.6",
3
+ "version": "3.5.0",
4
4
  "description": "MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, search, batch processing, structured data extraction, and LLM-powered content analysis.",
5
5
  "type": "module",
6
+ "mcpName": "io.github.firecrawl/firecrawl-mcp-server",
6
7
  "bin": {
7
8
  "firecrawl-mcp": "dist/index.js"
8
9
  },