npm - firecrawl-mcp - Versions diffs - 3.3.5 → 3.4.0 - Mend

firecrawl-mcp 3.3.5 → 3.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -12,7 +12,6 @@ A Model Context Protocol (MCP) server implementation that integrates with [Firec
 > Big thanks to [@vrknetha](https://github.com/vrknetha), [@knacklabs](https://www.knacklabs.ai) for the initial implementation!
 ## Features
 - Web scraping, crawling, and discovery
@@ -21,25 +20,6 @@ A Model Context Protocol (MCP) server implementation that integrates with [Firec
 - Automatic retries and rate limiting
 - Cloud and self-hosted support
 - SSE support
-- **Context limit support for MCP compatibility**
-## Context Limiting for MCP
-All tools now support the `maxResponseSize` parameter to limit response size for better MCP compatibility. This is especially useful for large responses that may exceed MCP context limits.
-**Example Usage:**
-```json
-{
-  "name": "firecrawl_scrape",
-  "arguments": {
-    "url": "https://example.com",
-    "formats": ["markdown"],
-    "maxResponseSize": 50000
-  }
-}
-```
-When the response exceeds the specified limit, content will be truncated with a clear message indicating truncation occurred. This parameter is optional and preserves full backward compatibility.
 > Play around with [our MCP Server on MCP.so's playground](https://mcp.so/playground?server=firecrawl-mcp-server) or on [Klavis AI](https://www.klavis.ai/mcp-servers).
@@ -83,7 +63,7 @@ To configure Firecrawl MCP in Cursor **v0.48.6**
      }
    }
    ```
 To configure Firecrawl MCP in Cursor **v0.45.6**
 1. Open Cursor Settings
@@ -94,8 +74,6 @@ To configure Firecrawl MCP in Cursor **v0.45.6**
    - Type: "command"
    - Command: `env FIRECRAWL_API_KEY=your-api-key npx -y firecrawl-mcp`
 > If you are using Windows and are running into issues, try `cmd /c "set FIRECRAWL_API_KEY=your-api-key && npx -y firecrawl-mcp"`
 Replace `your-api-key` with your Firecrawl API key. If you don't have one yet, you can create an account and get it from https://www.firecrawl.dev/app/api-keys
@@ -120,15 +98,15 @@ Add this to your `./codeium/windsurf/model_config.json`:
 }
 ```
-### Running with SSE Local Mode
+### Running with Streamable HTTP Local Mode
-To run the server using Server-Sent Events (SSE) locally instead of the default stdio transport:
+To run the server using Streamable HTTP locally instead of the default stdio transport:
 ```bash
-env SSE_LOCAL=true FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp
+env HTTP_STREAMABLE_SERVER=true FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp
 ```
-Use the url: http://localhost:3000/sse
+Use the url: http://localhost:3000/mcp
 ### Installing via Smithery (Legacy)
@@ -341,14 +319,14 @@ Use this guide to select the right tool for your task:
 ### Quick Reference Table
-| Tool                | Best for                                 | Returns         |
-|---------------------|------------------------------------------|-----------------|
-| scrape              | Single page content                      | markdown/html   |
-| batch_scrape        | Multiple known URLs                      | markdown/html[] |
-| map                 | Discovering URLs on a site               | URL[]           |
-| crawl               | Multi-page extraction (with limits)      | markdown/html[] |
-| search              | Web search for info                      | results[]       |
-| extract             | Structured data from pages               | JSON            |
+| Tool         | Best for                            | Returns         |
+| ------------ | ----------------------------------- | --------------- |
+| scrape       | Single page content                 | markdown/html   |
+| batch_scrape | Multiple known URLs                 | markdown/html[] |
+| map          | Discovering URLs on a site          | URL[]           |
+| crawl        | Multi-page extraction (with limits) | markdown/html[] |
+| search       | Web search for info                 | results[]       |
+| extract      | Structured data from pages          | JSON            |
 ## Available Tools
@@ -357,20 +335,25 @@ Use this guide to select the right tool for your task:
 Scrape content from a single URL with advanced options.
 **Best for:**
 - Single page content extraction, when you know exactly which page contains the information.
 **Not recommended for:**
 - Extracting content from multiple pages (use batch_scrape for known URLs, or map + batch_scrape to discover URLs first, or crawl for full page content)
 - When you're unsure which page contains the information (use search)
 - When you need structured data (use extract)
 **Common mistakes:**
 - Using scrape for a list of URLs (use batch_scrape instead).
 **Prompt Example:**
 > "Get the content of the page at https://example.com."
 **Usage Example:**
 ```json
 {
   "name": "firecrawl_scrape",
@@ -389,6 +372,7 @@ Scrape content from a single URL with advanced options.
 ```
 **Returns:**
 - Markdown, HTML, or other formats as specified.
 ### 2. Batch Scrape Tool (`firecrawl_batch_scrape`)
@@ -396,19 +380,24 @@ Scrape content from a single URL with advanced options.
 Scrape multiple URLs efficiently with built-in rate limiting and parallel processing.
 **Best for:**
 - Retrieving content from multiple pages, when you know exactly which pages to scrape.
 **Not recommended for:**
 - Discovering URLs (use map first if you don't know the URLs)
 - Scraping a single page (use scrape)
 **Common mistakes:**
 - Using batch_scrape with too many URLs at once (may hit rate limits or token overflow)
 **Prompt Example:**
 > "Get the content of these three blog posts: [url1, url2, url3]."
 **Usage Example:**
 ```json
 {
   "name": "firecrawl_batch_scrape",
@@ -423,6 +412,7 @@ Scrape multiple URLs efficiently with built-in rate limiting and parallel proces
 ```
 **Returns:**
 - Response includes operation ID for status checking:
 ```json
@@ -455,20 +445,25 @@ Check the status of a batch operation.
 Map a website to discover all indexed URLs on the site.
 **Best for:**
 - Discovering URLs on a website before deciding what to scrape
 - Finding specific sections of a website
 **Not recommended for:**
 - When you already know which specific URL you need (use scrape or batch_scrape)
 - When you need the content of the pages (use scrape after mapping)
 **Common mistakes:**
 - Using crawl to discover URLs instead of map
 **Prompt Example:**
 > "List all URLs on example.com."
 **Usage Example:**
 ```json
 {
   "name": "firecrawl_map",
@@ -479,6 +474,7 @@ Map a website to discover all indexed URLs on the site.
 ```
 **Returns:**
 - Array of URLs found on the site
 ### 5. Search Tool (`firecrawl_search`)
@@ -486,17 +482,21 @@ Map a website to discover all indexed URLs on the site.
 Search the web and optionally extract content from search results.
 **Best for:**
 - Finding specific information across multiple websites, when you don't know which website has the information.
 - When you need the most relevant content for a query
 **Not recommended for:**
 - When you already know which website to scrape (use scrape)
 - When you need comprehensive coverage of a single website (use map or crawl)
 **Common mistakes:**
 - Using crawl or map for open-ended questions (use search instead)
 **Usage Example:**
 ```json
 {
   "name": "firecrawl_search",
@@ -514,9 +514,11 @@ Search the web and optionally extract content from search results.
 ```
 **Returns:**
 - Array of search results (with optional scraped content)
 **Prompt Example:**
 > "Find the latest research papers on AI published in 2023."
 ### 6. Crawl Tool (`firecrawl_crawl`)
@@ -524,9 +526,11 @@ Search the web and optionally extract content from search results.
 Starts an asynchronous crawl job on a website and extract content from all pages.
 **Best for:**
 - Extracting content from multiple related pages, when you need comprehensive coverage.
 **Not recommended for:**
 - Extracting content from a single page (use scrape)
 - When token limits are a concern (use map + batch_scrape)
 - When you need fast results (crawling can be slow)
@@ -534,13 +538,16 @@ Starts an asynchronous crawl job on a website and extract content from all pages
 **Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control.
 **Common mistakes:**
 - Setting limit or maxDepth too high (causes token overflow)
 - Using crawl for a single page (use scrape instead)
 **Prompt Example:**
 > "Get all blog posts from the first two levels of example.com/blog."
 **Usage Example:**
 ```json
 {
   "name": "firecrawl_crawl",
@@ -555,6 +562,7 @@ Starts an asynchronous crawl job on a website and extract content from all pages
 ```
 **Returns:**
 - Response includes operation ID for status checking:
 ```json
@@ -583,20 +591,24 @@ Check the status of a crawl job.
 ```
 **Returns:**
 - Response includes the status of the crawl job:
 ### 8. Extract Tool (`firecrawl_extract`)
 Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.
 **Best for:**
 - Extracting specific structured data like prices, names, details.
 **Not recommended for:**
 - When you need the full content of a page (use scrape)
 - When you're not looking for specific structured data
 **Arguments:**
 - `urls`: Array of URLs to extract information from
 - `prompt`: Custom prompt for the LLM extraction
 - `systemPrompt`: System prompt to guide the LLM
@@ -607,9 +619,11 @@ Extract structured information from web pages using LLM capabilities. Supports b
 When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses Firecrawl's managed LLM service.
 **Prompt Example:**
 > "Extract the product name, price, and description from these product pages."
 **Usage Example:**
 ```json
 {
   "name": "firecrawl_extract",
@@ -634,6 +648,7 @@ When using a self-hosted instance, the extraction will use your configured LLM.
 ```
 **Returns:**
 - Extracted structured data as defined by your schema
 ```json

package/dist/index.js CHANGED Viewed

@@ -36,9 +36,9 @@ function removeEmptyTopLevel(obj) {
     return out;
 }
 class ConsoleLogger {
-    shouldLog = (process.env.CLOUD_SERVICE === 'true' ||
+    shouldLog = process.env.CLOUD_SERVICE === 'true' ||
         process.env.SSE_LOCAL === 'true' ||
-        process.env.HTTP_STREAMABLE_SERVER === 'true');
+        process.env.HTTP_STREAMABLE_SERVER === 'true';
     debug(...args) {
         if (this.shouldLog) {
             console.debug('[DEBUG]', new Date().toISOString(), ...args);
@@ -119,24 +119,26 @@ function getClient(session) {
         return createClient(session.firecrawlApiKey);
     }
     // For self-hosted instances, API key is optional if FIRECRAWL_API_URL is provided
-    if (!process.env.FIRECRAWL_API_URL && (!session || !session.firecrawlApiKey)) {
+    if (!process.env.FIRECRAWL_API_URL &&
+        (!session || !session.firecrawlApiKey)) {
         throw new Error('Unauthorized: API key is required when not using a self-hosted instance');
     }
     return createClient(session?.firecrawlApiKey);
 }
-function asText(data, maxResponseSize) {
-    const text = JSON.stringify(data, null, 2);
-    if (maxResponseSize && maxResponseSize > 0 && text.length > maxResponseSize) {
-        const truncatedText = text.substring(0, maxResponseSize - 100); // Reserve space for truncation message
-        return truncatedText + '\n\n[Content truncated due to size limit. Increase maxResponseSize parameter to see full content.]';
-    }
-    return text;
+function asText(data) {
+    return JSON.stringify(data, null, 2);
 }
 // scrape tool (v2 semantics, minimal args)
 // Centralized scrape params (used by scrape, and referenced in search/crawl scrapeOptions)
 // Define safe action types
 const safeActionTypes = ['wait', 'screenshot', 'scroll', 'scrape'];
-const otherActions = ['click', 'write', 'press', 'executeJavascript', 'generatePDF'];
+const otherActions = [
+    'click',
+    'write',
+    'press',
+    'executeJavascript',
+    'generatePDF',
+];
 const allActionTypes = [...safeActionTypes, ...otherActions];
 // Use appropriate action types based on safe mode
 const allowedActionTypes = SAFE_MODE ? safeActionTypes : allActionTypes;
@@ -168,24 +170,35 @@ const scrapeParamsSchema = z.object({
         }),
     ]))
         .optional(),
+    parsers: z
+        .array(z.union([
+        z.enum(['pdf']),
+        z.object({
+            type: z.enum(['pdf']),
+            maxPages: z.number().int().min(1).max(10000).optional(),
+        }),
+    ]))
+        .optional(),
     onlyMainContent: z.boolean().optional(),
     includeTags: z.array(z.string()).optional(),
     excludeTags: z.array(z.string()).optional(),
     waitFor: z.number().optional(),
-    ...(SAFE_MODE ? {} : {
-        actions: z
-            .array(z.object({
-            type: z.enum(allowedActionTypes),
-            selector: z.string().optional(),
-            milliseconds: z.number().optional(),
-            text: z.string().optional(),
-            key: z.string().optional(),
-            direction: z.enum(['up', 'down']).optional(),
-            script: z.string().optional(),
-            fullPage: z.boolean().optional(),
-        }))
-            .optional(),
-    }),
+    ...(SAFE_MODE
+        ? {}
+        : {
+            actions: z
+                .array(z.object({
+                type: z.enum(allowedActionTypes),
+                selector: z.string().optional(),
+                milliseconds: z.number().optional(),
+                text: z.string().optional(),
+                key: z.string().optional(),
+                direction: z.enum(['up', 'down']).optional(),
+                script: z.string().optional(),
+                fullPage: z.boolean().optional(),
+            }))
+                .optional(),
+        }),
     mobile: z.boolean().optional(),
     skipTlsVerification: z.boolean().optional(),
     removeBase64Images: z.boolean().optional(),
@@ -197,12 +210,11 @@ const scrapeParamsSchema = z.object({
         .optional(),
     storeInCache: z.boolean().optional(),
     maxAge: z.number().optional(),
-    maxResponseSize: z.number().optional(),
 });
 server.addTool({
     name: 'firecrawl_scrape',
     description: `
-Scrape content from a single URL with advanced options.
+Scrape content from a single URL with advanced options.
 This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs.
 **Best for:** Single page content extraction, when you know exactly which page contains the information.
@@ -216,24 +228,27 @@ This is the most powerful, fastest and most reliable scraper tool, if available
   "arguments": {
     "url": "https://example.com",
     "formats": ["markdown"],
-    "maxAge": 172800000,
-    "maxResponseSize": 50000
+    "maxAge": 172800000
   }
 }
 \`\`\`
 **Performance:** Add maxAge parameter for 500% faster scrapes using cached data.
-**Context Limiting:** Use maxResponseSize parameter to limit response size for MCP compatibility (e.g., 50000 characters).
 **Returns:** Markdown, HTML, or other formats as specified.
-${SAFE_MODE ? '**Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.' : ''}
+${SAFE_MODE
+        ? '**Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.'
+        : ''}
 `,
     parameters: scrapeParamsSchema,
     execute: async (args, { session, log }) => {
-        const { url, maxResponseSize, ...options } = args;
+        const { url, ...options } = args;
         const client = getClient(session);
         const cleaned = removeEmptyTopLevel(options);
         log.info('Scraping URL', { url: String(url) });
-        const res = await client.scrape(String(url), { ...cleaned, origin: ORIGIN });
-        return asText(res, maxResponseSize);
+        const res = await client.scrape(String(url), {
+            ...cleaned,
+            origin: ORIGIN,
+        });
+        return asText(res);
     },
 });
 server.addTool({
@@ -244,15 +259,13 @@ Map a website to discover all indexed URLs on the site.
 **Best for:** Discovering URLs on a website before deciding what to scrape; finding specific sections of a website.
 **Not recommended for:** When you already know which specific URL you need (use scrape or batch_scrape); when you need the content of the pages (use scrape after mapping).
 **Common mistakes:** Using crawl to discover URLs instead of map.
-**Context Limiting:** Use maxResponseSize parameter to limit response size for MCP compatibility.
 **Prompt Example:** "List all URLs on example.com."
 **Usage Example:**
 \`\`\`json
 {
   "name": "firecrawl_map",
   "arguments": {
-    "url": "https://example.com",
-    "maxResponseSize": 50000
+    "url": "https://example.com"
   }
 }
 \`\`\`
@@ -265,15 +278,17 @@ Map a website to discover all indexed URLs on the site.
         includeSubdomains: z.boolean().optional(),
         limit: z.number().optional(),
         ignoreQueryParameters: z.boolean().optional(),
-        maxResponseSize: z.number().optional(),
     }),
     execute: async (args, { session, log }) => {
-        const { url, maxResponseSize, ...options } = args;
+        const { url, ...options } = args;
         const client = getClient(session);
         const cleaned = removeEmptyTopLevel(options);
         log.info('Mapping URL', { url: String(url) });
-        const res = await client.map(String(url), { ...cleaned, origin: ORIGIN });
-        return asText(res, maxResponseSize);
+        const res = await client.map(String(url), {
+            ...cleaned,
+            origin: ORIGIN,
+        });
+        return asText(res);
     },
 });
 server.addTool({
@@ -301,7 +316,9 @@ The query also supports search operators, that you can use if needed to refine t
 **Prompt Example:** "Find the latest research papers on AI published in 2023."
 **Sources:** web, images, news, default to web unless needed images or news.
 **Scrape Options:** Only use scrapeOptions when you think it is absolutely necessary. When you do so default to a lower limit to avoid timeouts, 5 or lower.
-**Usage Example without formats:**
+**Optimal Workflow:** Search first using firecrawl_search without formats, then after fetching the results, use the scrape tool to get the content of the relevantpage(s) that you want to scrape
+**Usage Example without formats (Preferred):**
 \`\`\`json
 {
   "name": "firecrawl_search",
@@ -331,12 +348,10 @@ The query also supports search operators, that you can use if needed to refine t
     "scrapeOptions": {
       "formats": ["markdown"],
       "onlyMainContent": true
-    },
-    "maxResponseSize": 50000
+    }
   }
 }
 \`\`\`
-**Context Limiting:** Use maxResponseSize parameter to limit response size for MCP compatibility.
 **Returns:** Array of search results (with optional scraped content).
 `,
     parameters: z.object({
@@ -349,18 +364,17 @@ The query also supports search operators, that you can use if needed to refine t
             .array(z.object({ type: z.enum(['web', 'images', 'news']) }))
             .optional(),
         scrapeOptions: scrapeParamsSchema.omit({ url: true }).partial().optional(),
-        maxResponseSize: z.number().optional(),
     }),
     execute: async (args, { session, log }) => {
         const client = getClient(session);
-        const { query, maxResponseSize, ...opts } = args;
+        const { query, ...opts } = args;
         const cleaned = removeEmptyTopLevel(opts);
         log.info('Searching', { query: String(query) });
         const res = await client.search(query, {
             ...cleaned,
             origin: ORIGIN,
         });
-        return asText(res, maxResponseSize);
+        return asText(res);
     },
 });
 server.addTool({
@@ -383,14 +397,14 @@ server.addTool({
      "limit": 20,
      "allowExternalLinks": false,
      "deduplicateSimilarURLs": true,
-     "sitemap": "include",
-     "maxResponseSize": 50000
+     "sitemap": "include"
    }
  }
  \`\`\`
- **Context Limiting:** Use maxResponseSize parameter to limit response size for MCP compatibility.
  **Returns:** Operation ID for status checking; use firecrawl_check_crawl_status to check progress.
- ${SAFE_MODE ? '**Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.' : ''}
+ ${SAFE_MODE
+        ? '**Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.'
+        : ''}
  `,
     parameters: z.object({
         url: z.string(),
@@ -405,24 +419,25 @@ server.addTool({
         crawlEntireDomain: z.boolean().optional(),
         delay: z.number().optional(),
         maxConcurrency: z.number().optional(),
-        ...(SAFE_MODE ? {} : {
-            webhook: z
-                .union([
-                z.string(),
-                z.object({
-                    url: z.string(),
-                    headers: z.record(z.string(), z.string()).optional(),
-                }),
-            ])
-                .optional(),
-        }),
+        ...(SAFE_MODE
+            ? {}
+            : {
+                webhook: z
+                    .union([
+                    z.string(),
+                    z.object({
+                        url: z.string(),
+                        headers: z.record(z.string(), z.string()).optional(),
+                    }),
+                ])
+                    .optional(),
+            }),
         deduplicateSimilarURLs: z.boolean().optional(),
         ignoreQueryParameters: z.boolean().optional(),
         scrapeOptions: scrapeParamsSchema.omit({ url: true }).partial().optional(),
-        maxResponseSize: z.number().optional(),
     }),
     execute: async (args, { session, log }) => {
-        const { url, maxResponseSize, ...options } = args;
+        const { url, ...options } = args;
         const client = getClient(session);
         const cleaned = removeEmptyTopLevel(options);
         log.info('Starting crawl', { url: String(url) });
@@ -430,7 +445,7 @@ server.addTool({
             ...cleaned,
             origin: ORIGIN,
         });
-        return asText(res, maxResponseSize);
+        return asText(res);
     },
 });
 server.addTool({
@@ -443,23 +458,17 @@ Check the status of a crawl job.
 {
   "name": "firecrawl_check_crawl_status",
   "arguments": {
-    "id": "550e8400-e29b-41d4-a716-446655440000",
-    "maxResponseSize": 50000
+    "id": "550e8400-e29b-41d4-a716-446655440000"
   }
 }
 \`\`\`
-**Context Limiting:** Use maxResponseSize parameter to limit response size for MCP compatibility.
 **Returns:** Status and progress of the crawl job, including results if available.
 `,
-    parameters: z.object({
-        id: z.string(),
-        maxResponseSize: z.number().optional(),
-    }),
+    parameters: z.object({ id: z.string() }),
     execute: async (args, { session }) => {
-        const { id, maxResponseSize } = args;
         const client = getClient(session);
-        const res = await client.getCrawlStatus(id);
-        return asText(res, maxResponseSize);
+        const res = await client.getCrawlStatus(args.id);
+        return asText(res);
     },
 });
 server.addTool({
@@ -495,12 +504,10 @@ Extract structured information from web pages using LLM capabilities. Supports b
     },
     "allowExternalLinks": false,
     "enableWebSearch": false,
-    "includeSubdomains": false,
-    "maxResponseSize": 50000
+    "includeSubdomains": false
   }
 }
 \`\`\`
-**Context Limiting:** Use maxResponseSize parameter to limit response size for MCP compatibility.
 **Returns:** Extracted structured data as defined by your schema.
 `,
     parameters: z.object({
@@ -510,7 +517,6 @@ Extract structured information from web pages using LLM capabilities. Supports b
         allowExternalLinks: z.boolean().optional(),
         enableWebSearch: z.boolean().optional(),
         includeSubdomains: z.boolean().optional(),
-        maxResponseSize: z.number().optional(),
     }),
     execute: async (args, { session, log }) => {
         const client = getClient(session);
@@ -528,7 +534,7 @@ Extract structured information from web pages using LLM capabilities. Supports b
             origin: ORIGIN,
         });
         const res = await client.extract(extractBody);
-        return asText(res, a.maxResponseSize);
+        return asText(res);
     },
 });
 const PORT = Number(process.env.PORT || 3000);

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "firecrawl-mcp",
-  "version": "3.3.5",
+  "version": "3.4.0",
   "description": "MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, search, batch processing, structured data extraction, and LLM-powered content analysis.",
   "type": "module",
   "bin": {