firecrawl-mcp 3.7.4 → 3.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +0 -0
- package/README.md +270 -22
- package/dist/index-v1.js +1313 -0
- package/dist/index.js +290 -35
- package/dist/index.test.js +255 -0
- package/dist/jest.setup.js +58 -0
- package/dist/server-v1.js +1154 -0
- package/dist/server-v2.js +1067 -0
- package/dist/src/index.js +1053 -0
- package/dist/src/index.test.js +225 -0
- package/dist/versioned-server.js +203 -0
- package/package.json +2 -2
package/LICENSE
CHANGED
|
File without changes
|
package/README.md
CHANGED
|
@@ -17,6 +17,7 @@ A Model Context Protocol (MCP) server implementation that integrates with [Firec
|
|
|
17
17
|
- Web scraping, crawling, and discovery
|
|
18
18
|
- Search and content extraction
|
|
19
19
|
- Deep research and batch scraping
|
|
20
|
+
- Cloud browser sessions with agent-browser automation
|
|
20
21
|
- Automatic retries and rate limiting
|
|
21
22
|
- Cloud and self-hosted support
|
|
22
23
|
- SSE support
|
|
@@ -310,23 +311,32 @@ The server utilizes Firecrawl's built-in rate limiting and batch processing capa
|
|
|
310
311
|
Use this guide to select the right tool for your task:
|
|
311
312
|
|
|
312
313
|
- **If you know the exact URL(s) you want:**
|
|
313
|
-
- For one: use **scrape**
|
|
314
|
+
- For one: use **scrape** (with JSON format for structured data)
|
|
314
315
|
- For many: use **batch_scrape**
|
|
315
316
|
- **If you need to discover URLs on a site:** use **map**
|
|
316
317
|
- **If you want to search the web for info:** use **search**
|
|
317
|
-
- **If you
|
|
318
|
+
- **If you need complex research across multiple unknown sources:** use **agent**
|
|
318
319
|
- **If you want to analyze a whole site or section:** use **crawl** (with limits!)
|
|
320
|
+
- **If you need interactive browser automation** (click, type, navigate): use **browser**
|
|
319
321
|
|
|
320
322
|
### Quick Reference Table
|
|
321
323
|
|
|
322
|
-
| Tool | Best for | Returns
|
|
323
|
-
| ------------ | ----------------------------------- |
|
|
324
|
-
| scrape | Single page content | markdown
|
|
325
|
-
| batch_scrape | Multiple known URLs | markdown
|
|
326
|
-
| map | Discovering URLs on a site | URL[]
|
|
327
|
-
| crawl | Multi-page extraction (with limits) | markdown/html[]
|
|
328
|
-
| search | Web search for info | results[]
|
|
329
|
-
|
|
|
324
|
+
| Tool | Best for | Returns |
|
|
325
|
+
| ------------ | ----------------------------------- | -------------------------- |
|
|
326
|
+
| scrape | Single page content | JSON (preferred) or markdown |
|
|
327
|
+
| batch_scrape | Multiple known URLs | JSON (preferred) or markdown[] |
|
|
328
|
+
| map | Discovering URLs on a site | URL[] |
|
|
329
|
+
| crawl | Multi-page extraction (with limits) | markdown/html[] |
|
|
330
|
+
| search | Web search for info | results[] |
|
|
331
|
+
| agent | Complex multi-source research | JSON (structured data) |
|
|
332
|
+
| browser | Interactive multi-step automation | Session with live browser |
|
|
333
|
+
|
|
334
|
+
### Format Selection Guide
|
|
335
|
+
|
|
336
|
+
When using `scrape` or `batch_scrape`, choose the right format:
|
|
337
|
+
|
|
338
|
+
- **JSON format (recommended for most cases):** Use when you need specific data from a page. Define a schema based on what you need to extract. This keeps responses small and avoids context window overflow.
|
|
339
|
+
- **Markdown format (use sparingly):** Only when you genuinely need the full page content, such as reading an entire article for summarization or analyzing page structure.
|
|
330
340
|
|
|
331
341
|
## Available Tools
|
|
332
342
|
|
|
@@ -342,38 +352,75 @@ Scrape content from a single URL with advanced options.
|
|
|
342
352
|
|
|
343
353
|
- Extracting content from multiple pages (use batch_scrape for known URLs, or map + batch_scrape to discover URLs first, or crawl for full page content)
|
|
344
354
|
- When you're unsure which page contains the information (use search)
|
|
345
|
-
- When you need structured data (use extract)
|
|
346
355
|
|
|
347
356
|
**Common mistakes:**
|
|
348
357
|
|
|
349
358
|
- Using scrape for a list of URLs (use batch_scrape instead).
|
|
359
|
+
- Using markdown format by default (use JSON format to extract only what you need).
|
|
360
|
+
|
|
361
|
+
**Choosing the right format:**
|
|
362
|
+
|
|
363
|
+
- **JSON format (preferred):** For most use cases, use JSON format with a schema to extract only the specific data needed. This keeps responses focused and prevents context window overflow.
|
|
364
|
+
- **Markdown format:** Only when the task genuinely requires full page content (e.g., summarizing an entire article, analyzing page structure).
|
|
350
365
|
|
|
351
366
|
**Prompt Example:**
|
|
352
367
|
|
|
353
|
-
> "Get the
|
|
368
|
+
> "Get the product details from https://example.com/product."
|
|
354
369
|
|
|
355
|
-
**Usage Example:**
|
|
370
|
+
**Usage Example (JSON format - preferred):**
|
|
356
371
|
|
|
357
372
|
```json
|
|
358
373
|
{
|
|
359
374
|
"name": "firecrawl_scrape",
|
|
360
375
|
"arguments": {
|
|
361
|
-
"url": "https://example.com",
|
|
376
|
+
"url": "https://example.com/product",
|
|
377
|
+
"formats": [{
|
|
378
|
+
"type": "json",
|
|
379
|
+
"prompt": "Extract the product information",
|
|
380
|
+
"schema": {
|
|
381
|
+
"type": "object",
|
|
382
|
+
"properties": {
|
|
383
|
+
"name": { "type": "string" },
|
|
384
|
+
"price": { "type": "number" },
|
|
385
|
+
"description": { "type": "string" }
|
|
386
|
+
},
|
|
387
|
+
"required": ["name", "price"]
|
|
388
|
+
}
|
|
389
|
+
}]
|
|
390
|
+
}
|
|
391
|
+
}
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
**Usage Example (markdown format - when full content needed):**
|
|
395
|
+
|
|
396
|
+
```json
|
|
397
|
+
{
|
|
398
|
+
"name": "firecrawl_scrape",
|
|
399
|
+
"arguments": {
|
|
400
|
+
"url": "https://example.com/article",
|
|
362
401
|
"formats": ["markdown"],
|
|
363
|
-
"onlyMainContent": true
|
|
364
|
-
|
|
365
|
-
|
|
366
|
-
|
|
367
|
-
|
|
368
|
-
|
|
369
|
-
|
|
402
|
+
"onlyMainContent": true
|
|
403
|
+
}
|
|
404
|
+
}
|
|
405
|
+
```
|
|
406
|
+
|
|
407
|
+
**Usage Example (branding format - extract brand identity):**
|
|
408
|
+
|
|
409
|
+
```json
|
|
410
|
+
{
|
|
411
|
+
"name": "firecrawl_scrape",
|
|
412
|
+
"arguments": {
|
|
413
|
+
"url": "https://example.com",
|
|
414
|
+
"formats": ["branding"]
|
|
370
415
|
}
|
|
371
416
|
}
|
|
372
417
|
```
|
|
373
418
|
|
|
419
|
+
**Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication.
|
|
420
|
+
|
|
374
421
|
**Returns:**
|
|
375
422
|
|
|
376
|
-
-
|
|
423
|
+
- JSON structured data, markdown, branding profile, or other formats as specified.
|
|
377
424
|
|
|
378
425
|
### 2. Batch Scrape Tool (`firecrawl_batch_scrape`)
|
|
379
426
|
|
|
@@ -667,6 +714,207 @@ When using a self-hosted instance, the extraction will use your configured LLM.
|
|
|
667
714
|
}
|
|
668
715
|
```
|
|
669
716
|
|
|
717
|
+
### 9. Agent Tool (`firecrawl_agent`)
|
|
718
|
+
|
|
719
|
+
Autonomous web research agent. This is a separate AI agent layer that independently browses the internet, searches for information, navigates through pages, and extracts structured data based on your query.
|
|
720
|
+
|
|
721
|
+
**How it works:**
|
|
722
|
+
|
|
723
|
+
The agent performs web searches, follows links, reads pages, and gathers data autonomously. This runs **asynchronously** - it returns a job ID immediately, and you poll `firecrawl_agent_status` to check when complete and retrieve results.
|
|
724
|
+
|
|
725
|
+
**Async workflow:**
|
|
726
|
+
|
|
727
|
+
1. Call `firecrawl_agent` with your prompt/schema → returns job ID
|
|
728
|
+
2. Do other work while the agent researches (can take minutes for complex queries)
|
|
729
|
+
3. Poll `firecrawl_agent_status` with the job ID to check progress
|
|
730
|
+
4. When status is "completed", the response includes the extracted data
|
|
731
|
+
|
|
732
|
+
**Best for:**
|
|
733
|
+
|
|
734
|
+
- Complex research tasks where you don't know the exact URLs
|
|
735
|
+
- Multi-source data gathering
|
|
736
|
+
- Finding information scattered across the web
|
|
737
|
+
- Tasks where you can do other work while waiting for results
|
|
738
|
+
|
|
739
|
+
**Not recommended for:**
|
|
740
|
+
|
|
741
|
+
- Simple single-page scraping where you know the URL (use scrape with JSON format - faster and cheaper)
|
|
742
|
+
|
|
743
|
+
**Arguments:**
|
|
744
|
+
|
|
745
|
+
- `prompt`: Natural language description of the data you want (required, max 10,000 characters)
|
|
746
|
+
- `urls`: Optional array of URLs to focus the agent on specific pages
|
|
747
|
+
- `schema`: Optional JSON schema for structured output
|
|
748
|
+
|
|
749
|
+
**Prompt Example:**
|
|
750
|
+
|
|
751
|
+
> "Find the founders of Firecrawl and their backgrounds"
|
|
752
|
+
|
|
753
|
+
**Usage Example (start agent, then poll for results):**
|
|
754
|
+
|
|
755
|
+
```json
|
|
756
|
+
{
|
|
757
|
+
"name": "firecrawl_agent",
|
|
758
|
+
"arguments": {
|
|
759
|
+
"prompt": "Find the top 5 AI startups founded in 2024 and their funding amounts",
|
|
760
|
+
"schema": {
|
|
761
|
+
"type": "object",
|
|
762
|
+
"properties": {
|
|
763
|
+
"startups": {
|
|
764
|
+
"type": "array",
|
|
765
|
+
"items": {
|
|
766
|
+
"type": "object",
|
|
767
|
+
"properties": {
|
|
768
|
+
"name": { "type": "string" },
|
|
769
|
+
"funding": { "type": "string" },
|
|
770
|
+
"founded": { "type": "string" }
|
|
771
|
+
}
|
|
772
|
+
}
|
|
773
|
+
}
|
|
774
|
+
}
|
|
775
|
+
}
|
|
776
|
+
}
|
|
777
|
+
}
|
|
778
|
+
```
|
|
779
|
+
|
|
780
|
+
Then poll with `firecrawl_agent_status` using the returned job ID.
|
|
781
|
+
|
|
782
|
+
**Usage Example (with URLs - agent focuses on specific pages):**
|
|
783
|
+
|
|
784
|
+
```json
|
|
785
|
+
{
|
|
786
|
+
"name": "firecrawl_agent",
|
|
787
|
+
"arguments": {
|
|
788
|
+
"urls": ["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"],
|
|
789
|
+
"prompt": "Compare the features and pricing information from these pages"
|
|
790
|
+
}
|
|
791
|
+
}
|
|
792
|
+
```
|
|
793
|
+
|
|
794
|
+
**Returns:**
|
|
795
|
+
|
|
796
|
+
- Job ID for status checking. Use `firecrawl_agent_status` to poll for results.
|
|
797
|
+
|
|
798
|
+
### 10. Check Agent Status (`firecrawl_agent_status`)
|
|
799
|
+
|
|
800
|
+
Check the status of an agent job and retrieve results when complete. Use this to poll for results after starting an agent.
|
|
801
|
+
|
|
802
|
+
**Polling pattern:** Agent research can take minutes for complex queries. Poll this endpoint periodically (e.g., every 10-30 seconds) until status is "completed" or "failed".
|
|
803
|
+
|
|
804
|
+
```json
|
|
805
|
+
{
|
|
806
|
+
"name": "firecrawl_agent_status",
|
|
807
|
+
"arguments": {
|
|
808
|
+
"id": "550e8400-e29b-41d4-a716-446655440000"
|
|
809
|
+
}
|
|
810
|
+
}
|
|
811
|
+
```
|
|
812
|
+
|
|
813
|
+
**Possible statuses:**
|
|
814
|
+
|
|
815
|
+
- `processing`: Agent is still researching - check back later
|
|
816
|
+
- `completed`: Research finished - response includes the extracted data
|
|
817
|
+
- `failed`: An error occurred
|
|
818
|
+
|
|
819
|
+
### 11. Browser Create (`firecrawl_browser_create`)
|
|
820
|
+
|
|
821
|
+
Create a persistent cloud browser session for interactive automation.
|
|
822
|
+
|
|
823
|
+
**Best for:**
|
|
824
|
+
|
|
825
|
+
- Multi-step browser automation (navigate, click, fill forms, extract data)
|
|
826
|
+
- Interactive workflows that require maintaining state across actions
|
|
827
|
+
- Testing and debugging web pages in a live browser
|
|
828
|
+
|
|
829
|
+
**Arguments:**
|
|
830
|
+
|
|
831
|
+
- `ttl`: Total session lifetime in seconds (30-3600, optional)
|
|
832
|
+
- `activityTtl`: Idle timeout in seconds (10-3600, optional)
|
|
833
|
+
- `streamWebView`: Whether to enable live view streaming (optional)
|
|
834
|
+
|
|
835
|
+
**Usage Example:**
|
|
836
|
+
|
|
837
|
+
```json
|
|
838
|
+
{
|
|
839
|
+
"name": "firecrawl_browser_create",
|
|
840
|
+
"arguments": {
|
|
841
|
+
"ttl": 600
|
|
842
|
+
}
|
|
843
|
+
}
|
|
844
|
+
```
|
|
845
|
+
|
|
846
|
+
**Returns:**
|
|
847
|
+
|
|
848
|
+
- Session ID, CDP URL, and live view URL
|
|
849
|
+
|
|
850
|
+
### 12. Browser Execute (`firecrawl_browser_execute`)
|
|
851
|
+
|
|
852
|
+
Execute code in a browser session. Supports agent-browser commands (bash), Python, or JavaScript.
|
|
853
|
+
|
|
854
|
+
**Recommended: Use bash with agent-browser commands** (pre-installed in every sandbox):
|
|
855
|
+
|
|
856
|
+
```json
|
|
857
|
+
{
|
|
858
|
+
"name": "firecrawl_browser_execute",
|
|
859
|
+
"arguments": {
|
|
860
|
+
"sessionId": "session-id-here",
|
|
861
|
+
"code": "agent-browser open https://example.com",
|
|
862
|
+
"language": "bash"
|
|
863
|
+
}
|
|
864
|
+
}
|
|
865
|
+
```
|
|
866
|
+
|
|
867
|
+
**Common agent-browser commands:**
|
|
868
|
+
|
|
869
|
+
| Command | Description |
|
|
870
|
+
|---------|-------------|
|
|
871
|
+
| `agent-browser open <url>` | Navigate to URL |
|
|
872
|
+
| `agent-browser snapshot` | Accessibility tree with clickable refs |
|
|
873
|
+
| `agent-browser click @e5` | Click element by ref from snapshot |
|
|
874
|
+
| `agent-browser type @e3 "text"` | Type into element |
|
|
875
|
+
| `agent-browser get title` | Get page title |
|
|
876
|
+
| `agent-browser screenshot` | Take screenshot |
|
|
877
|
+
| `agent-browser --help` | Full command reference |
|
|
878
|
+
|
|
879
|
+
**For Playwright scripting, use Python:**
|
|
880
|
+
|
|
881
|
+
```json
|
|
882
|
+
{
|
|
883
|
+
"name": "firecrawl_browser_execute",
|
|
884
|
+
"arguments": {
|
|
885
|
+
"sessionId": "session-id-here",
|
|
886
|
+
"code": "await page.goto('https://example.com')\ntitle = await page.title()\nprint(title)",
|
|
887
|
+
"language": "python"
|
|
888
|
+
}
|
|
889
|
+
}
|
|
890
|
+
```
|
|
891
|
+
|
|
892
|
+
### 13. Browser List (`firecrawl_browser_list`)
|
|
893
|
+
|
|
894
|
+
List browser sessions, optionally filtered by status.
|
|
895
|
+
|
|
896
|
+
```json
|
|
897
|
+
{
|
|
898
|
+
"name": "firecrawl_browser_list",
|
|
899
|
+
"arguments": {
|
|
900
|
+
"status": "active"
|
|
901
|
+
}
|
|
902
|
+
}
|
|
903
|
+
```
|
|
904
|
+
|
|
905
|
+
### 14. Browser Delete (`firecrawl_browser_delete`)
|
|
906
|
+
|
|
907
|
+
Destroy a browser session.
|
|
908
|
+
|
|
909
|
+
```json
|
|
910
|
+
{
|
|
911
|
+
"name": "firecrawl_browser_delete",
|
|
912
|
+
"arguments": {
|
|
913
|
+
"sessionId": "session-id-here"
|
|
914
|
+
}
|
|
915
|
+
}
|
|
916
|
+
```
|
|
917
|
+
|
|
670
918
|
## Logging System
|
|
671
919
|
|
|
672
920
|
The server includes comprehensive logging:
|