firecrawl-mcp 3.7.3 → 3.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE CHANGED
File without changes
package/README.md CHANGED
@@ -17,6 +17,7 @@ A Model Context Protocol (MCP) server implementation that integrates with [Firec
17
17
  - Web scraping, crawling, and discovery
18
18
  - Search and content extraction
19
19
  - Deep research and batch scraping
20
+ - Cloud browser sessions with agent-browser automation
20
21
  - Automatic retries and rate limiting
21
22
  - Cloud and self-hosted support
22
23
  - SSE support
@@ -310,23 +311,32 @@ The server utilizes Firecrawl's built-in rate limiting and batch processing capa
310
311
  Use this guide to select the right tool for your task:
311
312
 
312
313
  - **If you know the exact URL(s) you want:**
313
- - For one: use **scrape**
314
+ - For one: use **scrape** (with JSON format for structured data)
314
315
  - For many: use **batch_scrape**
315
316
  - **If you need to discover URLs on a site:** use **map**
316
317
  - **If you want to search the web for info:** use **search**
317
- - **If you want to extract structured data:** use **extract**
318
+ - **If you need complex research across multiple unknown sources:** use **agent**
318
319
  - **If you want to analyze a whole site or section:** use **crawl** (with limits!)
320
+ - **If you need interactive browser automation** (click, type, navigate): use **browser**
319
321
 
320
322
  ### Quick Reference Table
321
323
 
322
- | Tool | Best for | Returns |
323
- | ------------ | ----------------------------------- | --------------- |
324
- | scrape | Single page content | markdown/html |
325
- | batch_scrape | Multiple known URLs | markdown/html[] |
326
- | map | Discovering URLs on a site | URL[] |
327
- | crawl | Multi-page extraction (with limits) | markdown/html[] |
328
- | search | Web search for info | results[] |
329
- | extract | Structured data from pages | JSON |
324
+ | Tool | Best for | Returns |
325
+ | ------------ | ----------------------------------- | -------------------------- |
326
+ | scrape | Single page content | JSON (preferred) or markdown |
327
+ | batch_scrape | Multiple known URLs | JSON (preferred) or markdown[] |
328
+ | map | Discovering URLs on a site | URL[] |
329
+ | crawl | Multi-page extraction (with limits) | markdown/html[] |
330
+ | search | Web search for info | results[] |
331
+ | agent | Complex multi-source research | JSON (structured data) |
332
+ | browser | Interactive multi-step automation | Session with live browser |
333
+
334
+ ### Format Selection Guide
335
+
336
+ When using `scrape` or `batch_scrape`, choose the right format:
337
+
338
+ - **JSON format (recommended for most cases):** Use when you need specific data from a page. Define a schema based on what you need to extract. This keeps responses small and avoids context window overflow.
339
+ - **Markdown format (use sparingly):** Only when you genuinely need the full page content, such as reading an entire article for summarization or analyzing page structure.
330
340
 
331
341
  ## Available Tools
332
342
 
@@ -342,38 +352,75 @@ Scrape content from a single URL with advanced options.
342
352
 
343
353
  - Extracting content from multiple pages (use batch_scrape for known URLs, or map + batch_scrape to discover URLs first, or crawl for full page content)
344
354
  - When you're unsure which page contains the information (use search)
345
- - When you need structured data (use extract)
346
355
 
347
356
  **Common mistakes:**
348
357
 
349
358
  - Using scrape for a list of URLs (use batch_scrape instead).
359
+ - Using markdown format by default (use JSON format to extract only what you need).
360
+
361
+ **Choosing the right format:**
362
+
363
+ - **JSON format (preferred):** For most use cases, use JSON format with a schema to extract only the specific data needed. This keeps responses focused and prevents context window overflow.
364
+ - **Markdown format:** Only when the task genuinely requires full page content (e.g., summarizing an entire article, analyzing page structure).
350
365
 
351
366
  **Prompt Example:**
352
367
 
353
- > "Get the content of the page at https://example.com."
368
+ > "Get the product details from https://example.com/product."
354
369
 
355
- **Usage Example:**
370
+ **Usage Example (JSON format - preferred):**
356
371
 
357
372
  ```json
358
373
  {
359
374
  "name": "firecrawl_scrape",
360
375
  "arguments": {
361
- "url": "https://example.com",
376
+ "url": "https://example.com/product",
377
+ "formats": [{
378
+ "type": "json",
379
+ "prompt": "Extract the product information",
380
+ "schema": {
381
+ "type": "object",
382
+ "properties": {
383
+ "name": { "type": "string" },
384
+ "price": { "type": "number" },
385
+ "description": { "type": "string" }
386
+ },
387
+ "required": ["name", "price"]
388
+ }
389
+ }]
390
+ }
391
+ }
392
+ ```
393
+
394
+ **Usage Example (markdown format - when full content needed):**
395
+
396
+ ```json
397
+ {
398
+ "name": "firecrawl_scrape",
399
+ "arguments": {
400
+ "url": "https://example.com/article",
362
401
  "formats": ["markdown"],
363
- "onlyMainContent": true,
364
- "waitFor": 1000,
365
- "timeout": 30000,
366
- "mobile": false,
367
- "includeTags": ["article", "main"],
368
- "excludeTags": ["nav", "footer"],
369
- "skipTlsVerification": false
402
+ "onlyMainContent": true
403
+ }
404
+ }
405
+ ```
406
+
407
+ **Usage Example (branding format - extract brand identity):**
408
+
409
+ ```json
410
+ {
411
+ "name": "firecrawl_scrape",
412
+ "arguments": {
413
+ "url": "https://example.com",
414
+ "formats": ["branding"]
370
415
  }
371
416
  }
372
417
  ```
373
418
 
419
+ **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication.
420
+
374
421
  **Returns:**
375
422
 
376
- - Markdown, HTML, or other formats as specified.
423
+ - JSON structured data, markdown, branding profile, or other formats as specified.
377
424
 
378
425
  ### 2. Batch Scrape Tool (`firecrawl_batch_scrape`)
379
426
 
@@ -667,6 +714,207 @@ When using a self-hosted instance, the extraction will use your configured LLM.
667
714
  }
668
715
  ```
669
716
 
717
+ ### 9. Agent Tool (`firecrawl_agent`)
718
+
719
+ Autonomous web research agent. This is a separate AI agent layer that independently browses the internet, searches for information, navigates through pages, and extracts structured data based on your query.
720
+
721
+ **How it works:**
722
+
723
+ The agent performs web searches, follows links, reads pages, and gathers data autonomously. This runs **asynchronously** - it returns a job ID immediately, and you poll `firecrawl_agent_status` to check when complete and retrieve results.
724
+
725
+ **Async workflow:**
726
+
727
+ 1. Call `firecrawl_agent` with your prompt/schema → returns job ID
728
+ 2. Do other work while the agent researches (can take minutes for complex queries)
729
+ 3. Poll `firecrawl_agent_status` with the job ID to check progress
730
+ 4. When status is "completed", the response includes the extracted data
731
+
732
+ **Best for:**
733
+
734
+ - Complex research tasks where you don't know the exact URLs
735
+ - Multi-source data gathering
736
+ - Finding information scattered across the web
737
+ - Tasks where you can do other work while waiting for results
738
+
739
+ **Not recommended for:**
740
+
741
+ - Simple single-page scraping where you know the URL (use scrape with JSON format - faster and cheaper)
742
+
743
+ **Arguments:**
744
+
745
+ - `prompt`: Natural language description of the data you want (required, max 10,000 characters)
746
+ - `urls`: Optional array of URLs to focus the agent on specific pages
747
+ - `schema`: Optional JSON schema for structured output
748
+
749
+ **Prompt Example:**
750
+
751
+ > "Find the founders of Firecrawl and their backgrounds"
752
+
753
+ **Usage Example (start agent, then poll for results):**
754
+
755
+ ```json
756
+ {
757
+ "name": "firecrawl_agent",
758
+ "arguments": {
759
+ "prompt": "Find the top 5 AI startups founded in 2024 and their funding amounts",
760
+ "schema": {
761
+ "type": "object",
762
+ "properties": {
763
+ "startups": {
764
+ "type": "array",
765
+ "items": {
766
+ "type": "object",
767
+ "properties": {
768
+ "name": { "type": "string" },
769
+ "funding": { "type": "string" },
770
+ "founded": { "type": "string" }
771
+ }
772
+ }
773
+ }
774
+ }
775
+ }
776
+ }
777
+ }
778
+ ```
779
+
780
+ Then poll with `firecrawl_agent_status` using the returned job ID.
781
+
782
+ **Usage Example (with URLs - agent focuses on specific pages):**
783
+
784
+ ```json
785
+ {
786
+ "name": "firecrawl_agent",
787
+ "arguments": {
788
+ "urls": ["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"],
789
+ "prompt": "Compare the features and pricing information from these pages"
790
+ }
791
+ }
792
+ ```
793
+
794
+ **Returns:**
795
+
796
+ - Job ID for status checking. Use `firecrawl_agent_status` to poll for results.
797
+
798
+ ### 10. Check Agent Status (`firecrawl_agent_status`)
799
+
800
+ Check the status of an agent job and retrieve results when complete. Use this to poll for results after starting an agent.
801
+
802
+ **Polling pattern:** Agent research can take minutes for complex queries. Poll this endpoint periodically (e.g., every 10-30 seconds) until status is "completed" or "failed".
803
+
804
+ ```json
805
+ {
806
+ "name": "firecrawl_agent_status",
807
+ "arguments": {
808
+ "id": "550e8400-e29b-41d4-a716-446655440000"
809
+ }
810
+ }
811
+ ```
812
+
813
+ **Possible statuses:**
814
+
815
+ - `processing`: Agent is still researching - check back later
816
+ - `completed`: Research finished - response includes the extracted data
817
+ - `failed`: An error occurred
818
+
819
+ ### 11. Browser Create (`firecrawl_browser_create`)
820
+
821
+ Create a persistent cloud browser session for interactive automation.
822
+
823
+ **Best for:**
824
+
825
+ - Multi-step browser automation (navigate, click, fill forms, extract data)
826
+ - Interactive workflows that require maintaining state across actions
827
+ - Testing and debugging web pages in a live browser
828
+
829
+ **Arguments:**
830
+
831
+ - `ttl`: Total session lifetime in seconds (30-3600, optional)
832
+ - `activityTtl`: Idle timeout in seconds (10-3600, optional)
833
+ - `streamWebView`: Whether to enable live view streaming (optional)
834
+
835
+ **Usage Example:**
836
+
837
+ ```json
838
+ {
839
+ "name": "firecrawl_browser_create",
840
+ "arguments": {
841
+ "ttl": 600
842
+ }
843
+ }
844
+ ```
845
+
846
+ **Returns:**
847
+
848
+ - Session ID, CDP URL, and live view URL
849
+
850
+ ### 12. Browser Execute (`firecrawl_browser_execute`)
851
+
852
+ Execute code in a browser session. Supports agent-browser commands (bash), Python, or JavaScript.
853
+
854
+ **Recommended: Use bash with agent-browser commands** (pre-installed in every sandbox):
855
+
856
+ ```json
857
+ {
858
+ "name": "firecrawl_browser_execute",
859
+ "arguments": {
860
+ "sessionId": "session-id-here",
861
+ "code": "agent-browser open https://example.com",
862
+ "language": "bash"
863
+ }
864
+ }
865
+ ```
866
+
867
+ **Common agent-browser commands:**
868
+
869
+ | Command | Description |
870
+ |---------|-------------|
871
+ | `agent-browser open <url>` | Navigate to URL |
872
+ | `agent-browser snapshot` | Accessibility tree with clickable refs |
873
+ | `agent-browser click @e5` | Click element by ref from snapshot |
874
+ | `agent-browser type @e3 "text"` | Type into element |
875
+ | `agent-browser get title` | Get page title |
876
+ | `agent-browser screenshot` | Take screenshot |
877
+ | `agent-browser --help` | Full command reference |
878
+
879
+ **For Playwright scripting, use Python:**
880
+
881
+ ```json
882
+ {
883
+ "name": "firecrawl_browser_execute",
884
+ "arguments": {
885
+ "sessionId": "session-id-here",
886
+ "code": "await page.goto('https://example.com')\ntitle = await page.title()\nprint(title)",
887
+ "language": "python"
888
+ }
889
+ }
890
+ ```
891
+
892
+ ### 13. Browser List (`firecrawl_browser_list`)
893
+
894
+ List browser sessions, optionally filtered by status.
895
+
896
+ ```json
897
+ {
898
+ "name": "firecrawl_browser_list",
899
+ "arguments": {
900
+ "status": "active"
901
+ }
902
+ }
903
+ ```
904
+
905
+ ### 14. Browser Delete (`firecrawl_browser_delete`)
906
+
907
+ Destroy a browser session.
908
+
909
+ ```json
910
+ {
911
+ "name": "firecrawl_browser_delete",
912
+ "arguments": {
913
+ "sessionId": "session-id-here"
914
+ }
915
+ }
916
+ ```
917
+
670
918
  ## Logging System
671
919
 
672
920
  The server includes comprehensive logging: