npm - firecrawl-mcp - Versions diffs - 3.7.4 → 3.9.0 - Mend

firecrawl-mcp 3.7.4 → 3.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/LICENSE +0 -0
package/README.md +270 -22
package/dist/index-v1.js +1313 -0
package/dist/index.js +290 -35
package/dist/index.test.js +255 -0
package/dist/jest.setup.js +58 -0
package/dist/server-v1.js +1154 -0
package/dist/server-v2.js +1067 -0
package/dist/src/index.js +1053 -0
package/dist/src/index.test.js +225 -0
package/dist/versioned-server.js +203 -0
package/package.json +2 -2

package/LICENSE CHANGED Viewed

File without changes

package/README.md CHANGED Viewed

@@ -17,6 +17,7 @@ A Model Context Protocol (MCP) server implementation that integrates with [Firec
 - Web scraping, crawling, and discovery
 - Search and content extraction
 - Deep research and batch scraping
+- Cloud browser sessions with agent-browser automation
 - Automatic retries and rate limiting
 - Cloud and self-hosted support
 - SSE support
@@ -310,23 +311,32 @@ The server utilizes Firecrawl's built-in rate limiting and batch processing capa
 Use this guide to select the right tool for your task:
 - **If you know the exact URL(s) you want:**
-  - For one: use **scrape**
+  - For one: use **scrape** (with JSON format for structured data)
   - For many: use **batch_scrape**
 - **If you need to discover URLs on a site:** use **map**
 - **If you want to search the web for info:** use **search**
-- **If you want to extract structured data:** use **extract**
+- **If you need complex research across multiple unknown sources:** use **agent**
 - **If you want to analyze a whole site or section:** use **crawl** (with limits!)
+- **If you need interactive browser automation** (click, type, navigate): use **browser**
 ### Quick Reference Table
-| Tool         | Best for                            | Returns         |
-| ------------ | ----------------------------------- | --------------- |
-| scrape       | Single page content                 | markdown/html   |
-| batch_scrape | Multiple known URLs                 | markdown/html[] |
-| map          | Discovering URLs on a site          | URL[]           |
-| crawl        | Multi-page extraction (with limits) | markdown/html[] |
-| search       | Web search for info                 | results[]       |
-| extract      | Structured data from pages          | JSON            |
+| Tool         | Best for                            | Returns                    |
+| ------------ | ----------------------------------- | -------------------------- |
+| scrape       | Single page content                 | JSON (preferred) or markdown |
+| batch_scrape | Multiple known URLs                 | JSON (preferred) or markdown[] |
+| map          | Discovering URLs on a site          | URL[]                      |
+| crawl        | Multi-page extraction (with limits) | markdown/html[]            |
+| search       | Web search for info                 | results[]                  |
+| agent        | Complex multi-source research       | JSON (structured data)     |
+| browser      | Interactive multi-step automation    | Session with live browser  |
+### Format Selection Guide
+When using `scrape` or `batch_scrape`, choose the right format:
+- **JSON format (recommended for most cases):** Use when you need specific data from a page. Define a schema based on what you need to extract. This keeps responses small and avoids context window overflow.
+- **Markdown format (use sparingly):** Only when you genuinely need the full page content, such as reading an entire article for summarization or analyzing page structure.
 ## Available Tools
@@ -342,38 +352,75 @@ Scrape content from a single URL with advanced options.
 - Extracting content from multiple pages (use batch_scrape for known URLs, or map + batch_scrape to discover URLs first, or crawl for full page content)
 - When you're unsure which page contains the information (use search)
-- When you need structured data (use extract)
 **Common mistakes:**
 - Using scrape for a list of URLs (use batch_scrape instead).
+- Using markdown format by default (use JSON format to extract only what you need).
+**Choosing the right format:**
+- **JSON format (preferred):** For most use cases, use JSON format with a schema to extract only the specific data needed. This keeps responses focused and prevents context window overflow.
+- **Markdown format:** Only when the task genuinely requires full page content (e.g., summarizing an entire article, analyzing page structure).
 **Prompt Example:**
-> "Get the content of the page at https://example.com."
+> "Get the product details from https://example.com/product."
-**Usage Example:**
+**Usage Example (JSON format - preferred):**
 ```json
 {
   "name": "firecrawl_scrape",
   "arguments": {
-    "url": "https://example.com",
+    "url": "https://example.com/product",
+    "formats": [{
+      "type": "json",
+      "prompt": "Extract the product information",
+      "schema": {
+        "type": "object",
+        "properties": {
+          "name": { "type": "string" },
+          "price": { "type": "number" },
+          "description": { "type": "string" }
+        },
+        "required": ["name", "price"]
+      }
+    }]
+  }
+}
+```
+**Usage Example (markdown format - when full content needed):**
+```json
+{
+  "name": "firecrawl_scrape",
+  "arguments": {
+    "url": "https://example.com/article",
     "formats": ["markdown"],
-    "onlyMainContent": true,
-    "waitFor": 1000,
-    "timeout": 30000,
-    "mobile": false,
-    "includeTags": ["article", "main"],
-    "excludeTags": ["nav", "footer"],
-    "skipTlsVerification": false
+    "onlyMainContent": true
+  }
+}
+```
+**Usage Example (branding format - extract brand identity):**
+```json
+{
+  "name": "firecrawl_scrape",
+  "arguments": {
+    "url": "https://example.com",
+    "formats": ["branding"]
   }
 }
 ```
+**Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication.
 **Returns:**
-- Markdown, HTML, or other formats as specified.
+- JSON structured data, markdown, branding profile, or other formats as specified.
 ### 2. Batch Scrape Tool (`firecrawl_batch_scrape`)
@@ -667,6 +714,207 @@ When using a self-hosted instance, the extraction will use your configured LLM.
 }
 ```
+### 9. Agent Tool (`firecrawl_agent`)
+Autonomous web research agent. This is a separate AI agent layer that independently browses the internet, searches for information, navigates through pages, and extracts structured data based on your query.
+**How it works:**
+The agent performs web searches, follows links, reads pages, and gathers data autonomously. This runs **asynchronously** - it returns a job ID immediately, and you poll `firecrawl_agent_status` to check when complete and retrieve results.
+**Async workflow:**
+1. Call `firecrawl_agent` with your prompt/schema → returns job ID
+2. Do other work while the agent researches (can take minutes for complex queries)
+3. Poll `firecrawl_agent_status` with the job ID to check progress
+4. When status is "completed", the response includes the extracted data
+**Best for:**
+- Complex research tasks where you don't know the exact URLs
+- Multi-source data gathering
+- Finding information scattered across the web
+- Tasks where you can do other work while waiting for results
+**Not recommended for:**
+- Simple single-page scraping where you know the URL (use scrape with JSON format - faster and cheaper)
+**Arguments:**
+- `prompt`: Natural language description of the data you want (required, max 10,000 characters)
+- `urls`: Optional array of URLs to focus the agent on specific pages
+- `schema`: Optional JSON schema for structured output
+**Prompt Example:**
+> "Find the founders of Firecrawl and their backgrounds"
+**Usage Example (start agent, then poll for results):**
+```json
+{
+  "name": "firecrawl_agent",
+  "arguments": {
+    "prompt": "Find the top 5 AI startups founded in 2024 and their funding amounts",
+    "schema": {
+      "type": "object",
+      "properties": {
+        "startups": {
+          "type": "array",
+          "items": {
+            "type": "object",
+            "properties": {
+              "name": { "type": "string" },
+              "funding": { "type": "string" },
+              "founded": { "type": "string" }
+            }
+          }
+        }
+      }
+    }
+  }
+}
+```
+Then poll with `firecrawl_agent_status` using the returned job ID.
+**Usage Example (with URLs - agent focuses on specific pages):**
+```json
+{
+  "name": "firecrawl_agent",
+  "arguments": {
+    "urls": ["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"],
+    "prompt": "Compare the features and pricing information from these pages"
+  }
+}
+```
+**Returns:**
+- Job ID for status checking. Use `firecrawl_agent_status` to poll for results.
+### 10. Check Agent Status (`firecrawl_agent_status`)
+Check the status of an agent job and retrieve results when complete. Use this to poll for results after starting an agent.
+**Polling pattern:** Agent research can take minutes for complex queries. Poll this endpoint periodically (e.g., every 10-30 seconds) until status is "completed" or "failed".
+```json
+{
+  "name": "firecrawl_agent_status",
+  "arguments": {
+    "id": "550e8400-e29b-41d4-a716-446655440000"
+  }
+}
+```
+**Possible statuses:**
+- `processing`: Agent is still researching - check back later
+- `completed`: Research finished - response includes the extracted data
+- `failed`: An error occurred
+### 11. Browser Create (`firecrawl_browser_create`)
+Create a persistent cloud browser session for interactive automation.
+**Best for:**
+- Multi-step browser automation (navigate, click, fill forms, extract data)
+- Interactive workflows that require maintaining state across actions
+- Testing and debugging web pages in a live browser
+**Arguments:**
+- `ttl`: Total session lifetime in seconds (30-3600, optional)
+- `activityTtl`: Idle timeout in seconds (10-3600, optional)
+- `streamWebView`: Whether to enable live view streaming (optional)
+**Usage Example:**
+```json
+{
+  "name": "firecrawl_browser_create",
+  "arguments": {
+    "ttl": 600
+  }
+}
+```
+**Returns:**
+- Session ID, CDP URL, and live view URL
+### 12. Browser Execute (`firecrawl_browser_execute`)
+Execute code in a browser session. Supports agent-browser commands (bash), Python, or JavaScript.
+**Recommended: Use bash with agent-browser commands** (pre-installed in every sandbox):
+```json
+{
+  "name": "firecrawl_browser_execute",
+  "arguments": {
+    "sessionId": "session-id-here",
+    "code": "agent-browser open https://example.com",
+    "language": "bash"
+  }
+}
+```
+**Common agent-browser commands:**
+| Command | Description |
+|---------|-------------|
+| `agent-browser open <url>` | Navigate to URL |
+| `agent-browser snapshot` | Accessibility tree with clickable refs |
+| `agent-browser click @e5` | Click element by ref from snapshot |
+| `agent-browser type @e3 "text"` | Type into element |
+| `agent-browser get title` | Get page title |
+| `agent-browser screenshot` | Take screenshot |
+| `agent-browser --help` | Full command reference |
+**For Playwright scripting, use Python:**
+```json
+{
+  "name": "firecrawl_browser_execute",
+  "arguments": {
+    "sessionId": "session-id-here",
+    "code": "await page.goto('https://example.com')\ntitle = await page.title()\nprint(title)",
+    "language": "python"
+  }
+}
+```
+### 13. Browser List (`firecrawl_browser_list`)
+List browser sessions, optionally filtered by status.
+```json
+{
+  "name": "firecrawl_browser_list",
+  "arguments": {
+    "status": "active"
+  }
+}
+```
+### 14. Browser Delete (`firecrawl_browser_delete`)
+Destroy a browser session.
+```json
+{
+  "name": "firecrawl_browser_delete",
+  "arguments": {
+    "sessionId": "session-id-here"
+  }
+}
+```
 ## Logging System
 The server includes comprehensive logging: