firecrawl-mcp 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2023 vrknetha
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,398 @@
1
+ # FireCrawl MCP Server
2
+
3
+ [![smithery badge](https://smithery.ai/badge/mcp-server-firecrawl)](https://smithery.ai/server/mcp-server-firecrawl)
4
+
5
+ A Model Context Protocol (MCP) server implementation that integrates with FireCrawl for advanced web scraping capabilities.
6
+
7
+ <a href="https://glama.ai/mcp/servers/57mideuljt"><img width="380" height="200" src="https://glama.ai/mcp/servers/57mideuljt/badge" alt="mcp-server-firecrawl MCP server" /></a>
8
+
9
+ ## Features
10
+
11
+ - Web scraping with JavaScript rendering
12
+ - Efficient batch processing with built-in rate limiting
13
+ - URL discovery and crawling
14
+ - Web search with content extraction
15
+ - Automatic retries with exponential backoff
16
+ - Credit usage monitoring for cloud API
17
+ - Comprehensive logging system
18
+ - Support for cloud and self-hosted FireCrawl instances
19
+ - Mobile/Desktop viewport support
20
+ - Smart content filtering with tag inclusion/exclusion
21
+
22
+ ## Installation
23
+
24
+ ### Installing via Smithery
25
+
26
+ To install FireCrawl for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@mendableai/mcp-server-firecrawl):
27
+
28
+ ```bash
29
+ npx -y @smithery/cli install @mendableai/mcp-server-firecrawl --client claude
30
+ ```
31
+
32
+ ### Manual Installation
33
+
34
+ ```bash
35
+ npm install -g mcp-server-firecrawl
36
+ ```
37
+
38
+ ## Configuration
39
+
40
+ ### Environment Variables
41
+
42
+ #### Required for Cloud API
43
+
44
+ - `FIRECRAWL_API_KEY`: Your FireCrawl API key
45
+ - Required when using cloud API (default)
46
+ - Optional when using self-hosted instance with `FIRECRAWL_API_URL`
47
+ - `FIRECRAWL_API_URL` (Optional): Custom API endpoint for self-hosted instances
48
+ - Example: `https://firecrawl.your-domain.com`
49
+ - If not provided, the cloud API will be used (requires API key)
50
+
51
+ #### Optional Configuration
52
+
53
+ ##### Retry Configuration
54
+
55
+ - `FIRECRAWL_RETRY_MAX_ATTEMPTS`: Maximum number of retry attempts (default: 3)
56
+ - `FIRECRAWL_RETRY_INITIAL_DELAY`: Initial delay in milliseconds before first retry (default: 1000)
57
+ - `FIRECRAWL_RETRY_MAX_DELAY`: Maximum delay in milliseconds between retries (default: 10000)
58
+ - `FIRECRAWL_RETRY_BACKOFF_FACTOR`: Exponential backoff multiplier (default: 2)
59
+
60
+ ##### Credit Usage Monitoring
61
+
62
+ - `FIRECRAWL_CREDIT_WARNING_THRESHOLD`: Credit usage warning threshold (default: 1000)
63
+ - `FIRECRAWL_CREDIT_CRITICAL_THRESHOLD`: Credit usage critical threshold (default: 100)
64
+
65
+ ### Configuration Examples
66
+
67
+ For cloud API usage with custom retry and credit monitoring:
68
+
69
+ ```bash
70
+ # Required for cloud API
71
+ export FIRECRAWL_API_KEY=your-api-key
72
+
73
+ # Optional retry configuration
74
+ export FIRECRAWL_RETRY_MAX_ATTEMPTS=5 # Increase max retry attempts
75
+ export FIRECRAWL_RETRY_INITIAL_DELAY=2000 # Start with 2s delay
76
+ export FIRECRAWL_RETRY_MAX_DELAY=30000 # Maximum 30s delay
77
+ export FIRECRAWL_RETRY_BACKOFF_FACTOR=3 # More aggressive backoff
78
+
79
+ # Optional credit monitoring
80
+ export FIRECRAWL_CREDIT_WARNING_THRESHOLD=2000 # Warning at 2000 credits
81
+ export FIRECRAWL_CREDIT_CRITICAL_THRESHOLD=500 # Critical at 500 credits
82
+ ```
83
+
84
+ For self-hosted instance:
85
+
86
+ ```bash
87
+ # Required for self-hosted
88
+ export FIRECRAWL_API_URL=https://firecrawl.your-domain.com
89
+
90
+ # Optional authentication for self-hosted
91
+ export FIRECRAWL_API_KEY=your-api-key # If your instance requires auth
92
+
93
+ # Custom retry configuration
94
+ export FIRECRAWL_RETRY_MAX_ATTEMPTS=10
95
+ export FIRECRAWL_RETRY_INITIAL_DELAY=500 # Start with faster retries
96
+ ```
97
+
98
+ ### Usage with Claude Desktop
99
+
100
+ Add this to your `claude_desktop_config.json`:
101
+
102
+ ```json
103
+ {
104
+ "mcpServers": {
105
+ "mcp-server-firecrawl": {
106
+ "command": "npx",
107
+ "args": ["-y", "mcp-server-firecrawl"],
108
+ "env": {
109
+ "FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE",
110
+
111
+ "FIRECRAWL_RETRY_MAX_ATTEMPTS": "5",
112
+ "FIRECRAWL_RETRY_INITIAL_DELAY": "2000",
113
+ "FIRECRAWL_RETRY_MAX_DELAY": "30000",
114
+ "FIRECRAWL_RETRY_BACKOFF_FACTOR": "3",
115
+
116
+ "FIRECRAWL_CREDIT_WARNING_THRESHOLD": "2000",
117
+ "FIRECRAWL_CREDIT_CRITICAL_THRESHOLD": "500"
118
+ }
119
+ }
120
+ }
121
+ }
122
+ ```
123
+
124
+ ### System Configuration
125
+
126
+ The server includes several configurable parameters that can be set via environment variables. Here are the default values if not configured:
127
+
128
+ ```typescript
129
+ const CONFIG = {
130
+ retry: {
131
+ maxAttempts: 3, // Number of retry attempts for rate-limited requests
132
+ initialDelay: 1000, // Initial delay before first retry (in milliseconds)
133
+ maxDelay: 10000, // Maximum delay between retries (in milliseconds)
134
+ backoffFactor: 2, // Multiplier for exponential backoff
135
+ },
136
+ credit: {
137
+ warningThreshold: 1000, // Warn when credit usage reaches this level
138
+ criticalThreshold: 100, // Critical alert when credit usage reaches this level
139
+ },
140
+ };
141
+ ```
142
+
143
+ These configurations control:
144
+
145
+ 1. **Retry Behavior**
146
+
147
+ - Automatically retries failed requests due to rate limits
148
+ - Uses exponential backoff to avoid overwhelming the API
149
+ - Example: With default settings, retries will be attempted at:
150
+ - 1st retry: 1 second delay
151
+ - 2nd retry: 2 seconds delay
152
+ - 3rd retry: 4 seconds delay (capped at maxDelay)
153
+
154
+ 2. **Credit Usage Monitoring**
155
+ - Tracks API credit consumption for cloud API usage
156
+ - Provides warnings at specified thresholds
157
+ - Helps prevent unexpected service interruption
158
+ - Example: With default settings:
159
+ - Warning at 1000 credits remaining
160
+ - Critical alert at 100 credits remaining
161
+
162
+ ### Rate Limiting and Batch Processing
163
+
164
+ The server utilizes FireCrawl's built-in rate limiting and batch processing capabilities:
165
+
166
+ - Automatic rate limit handling with exponential backoff
167
+ - Efficient parallel processing for batch operations
168
+ - Smart request queuing and throttling
169
+ - Automatic retries for transient errors
170
+
171
+ ## Available Tools
172
+
173
+ ### 1. Scrape Tool (`firecrawl_scrape`)
174
+
175
+ Scrape content from a single URL with advanced options.
176
+
177
+ ```json
178
+ {
179
+ "name": "firecrawl_scrape",
180
+ "arguments": {
181
+ "url": "https://example.com",
182
+ "formats": ["markdown"],
183
+ "onlyMainContent": true,
184
+ "waitFor": 1000,
185
+ "timeout": 30000,
186
+ "mobile": false,
187
+ "includeTags": ["article", "main"],
188
+ "excludeTags": ["nav", "footer"],
189
+ "skipTlsVerification": false
190
+ }
191
+ }
192
+ ```
193
+
194
+ ### 2. Batch Scrape Tool (`firecrawl_batch_scrape`)
195
+
196
+ Scrape multiple URLs efficiently with built-in rate limiting and parallel processing.
197
+
198
+ ```json
199
+ {
200
+ "name": "firecrawl_batch_scrape",
201
+ "arguments": {
202
+ "urls": ["https://example1.com", "https://example2.com"],
203
+ "options": {
204
+ "formats": ["markdown"],
205
+ "onlyMainContent": true
206
+ }
207
+ }
208
+ }
209
+ ```
210
+
211
+ Response includes operation ID for status checking:
212
+
213
+ ```json
214
+ {
215
+ "content": [
216
+ {
217
+ "type": "text",
218
+ "text": "Batch operation queued with ID: batch_1. Use firecrawl_check_batch_status to check progress."
219
+ }
220
+ ],
221
+ "isError": false
222
+ }
223
+ ```
224
+
225
+ ### 3. Check Batch Status (`firecrawl_check_batch_status`)
226
+
227
+ Check the status of a batch operation.
228
+
229
+ ```json
230
+ {
231
+ "name": "firecrawl_check_batch_status",
232
+ "arguments": {
233
+ "id": "batch_1"
234
+ }
235
+ }
236
+ ```
237
+
238
+ ### 4. Search Tool (`firecrawl_search`)
239
+
240
+ Search the web and optionally extract content from search results.
241
+
242
+ ```json
243
+ {
244
+ "name": "firecrawl_search",
245
+ "arguments": {
246
+ "query": "your search query",
247
+ "limit": 5,
248
+ "lang": "en",
249
+ "country": "us",
250
+ "scrapeOptions": {
251
+ "formats": ["markdown"],
252
+ "onlyMainContent": true
253
+ }
254
+ }
255
+ }
256
+ ```
257
+
258
+ ### 5. Crawl Tool (`firecrawl_crawl`)
259
+
260
+ Start an asynchronous crawl with advanced options.
261
+
262
+ ```json
263
+ {
264
+ "name": "firecrawl_crawl",
265
+ "arguments": {
266
+ "url": "https://example.com",
267
+ "maxDepth": 2,
268
+ "limit": 100,
269
+ "allowExternalLinks": false,
270
+ "deduplicateSimilarURLs": true
271
+ }
272
+ }
273
+ ```
274
+
275
+ ### 6. Extract Tool (`firecrawl_extract`)
276
+
277
+ Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.
278
+
279
+ ```json
280
+ {
281
+ "name": "firecrawl_extract",
282
+ "arguments": {
283
+ "urls": ["https://example.com/page1", "https://example.com/page2"],
284
+ "prompt": "Extract product information including name, price, and description",
285
+ "systemPrompt": "You are a helpful assistant that extracts product information",
286
+ "schema": {
287
+ "type": "object",
288
+ "properties": {
289
+ "name": { "type": "string" },
290
+ "price": { "type": "number" },
291
+ "description": { "type": "string" }
292
+ },
293
+ "required": ["name", "price"]
294
+ },
295
+ "allowExternalLinks": false,
296
+ "enableWebSearch": false,
297
+ "includeSubdomains": false
298
+ }
299
+ }
300
+ ```
301
+
302
+ Example response:
303
+
304
+ ```json
305
+ {
306
+ "content": [
307
+ {
308
+ "type": "text",
309
+ "text": {
310
+ "name": "Example Product",
311
+ "price": 99.99,
312
+ "description": "This is an example product description"
313
+ }
314
+ }
315
+ ],
316
+ "isError": false
317
+ }
318
+ ```
319
+
320
+ #### Extract Tool Options:
321
+
322
+ - `urls`: Array of URLs to extract information from
323
+ - `prompt`: Custom prompt for the LLM extraction
324
+ - `systemPrompt`: System prompt to guide the LLM
325
+ - `schema`: JSON schema for structured data extraction
326
+ - `allowExternalLinks`: Allow extraction from external links
327
+ - `enableWebSearch`: Enable web search for additional context
328
+ - `includeSubdomains`: Include subdomains in extraction
329
+
330
+ When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses FireCrawl's managed LLM service.
331
+
332
+ ## Logging System
333
+
334
+ The server includes comprehensive logging:
335
+
336
+ - Operation status and progress
337
+ - Performance metrics
338
+ - Credit usage monitoring
339
+ - Rate limit tracking
340
+ - Error conditions
341
+
342
+ Example log messages:
343
+
344
+ ```
345
+ [INFO] FireCrawl MCP Server initialized successfully
346
+ [INFO] Starting scrape for URL: https://example.com
347
+ [INFO] Batch operation queued with ID: batch_1
348
+ [WARNING] Credit usage has reached warning threshold
349
+ [ERROR] Rate limit exceeded, retrying in 2s...
350
+ ```
351
+
352
+ ## Error Handling
353
+
354
+ The server provides robust error handling:
355
+
356
+ - Automatic retries for transient errors
357
+ - Rate limit handling with backoff
358
+ - Detailed error messages
359
+ - Credit usage warnings
360
+ - Network resilience
361
+
362
+ Example error response:
363
+
364
+ ```json
365
+ {
366
+ "content": [
367
+ {
368
+ "type": "text",
369
+ "text": "Error: Rate limit exceeded. Retrying in 2 seconds..."
370
+ }
371
+ ],
372
+ "isError": true
373
+ }
374
+ ```
375
+
376
+ ## Development
377
+
378
+ ```bash
379
+ # Install dependencies
380
+ npm install
381
+
382
+ # Build
383
+ npm run build
384
+
385
+ # Run tests
386
+ npm test
387
+ ```
388
+
389
+ ### Contributing
390
+
391
+ 1. Fork the repository
392
+ 2. Create your feature branch
393
+ 3. Run tests: `npm test`
394
+ 4. Submit a pull request
395
+
396
+ ## License
397
+
398
+ MIT License - see LICENSE file for details
@@ -0,0 +1,58 @@
1
+ import { jest } from '@jest/globals';
2
+ // Set test timeout
3
+ jest.setTimeout(30000);
4
+ // Create mock responses
5
+ const mockSearchResponse = {
6
+ success: true,
7
+ data: [
8
+ {
9
+ url: 'https://example.com',
10
+ title: 'Test Page',
11
+ description: 'Test Description',
12
+ markdown: '# Test Content',
13
+ actions: null,
14
+ },
15
+ ],
16
+ };
17
+ const mockBatchScrapeResponse = {
18
+ success: true,
19
+ id: 'test-batch-id',
20
+ };
21
+ const mockBatchStatusResponse = {
22
+ success: true,
23
+ status: 'completed',
24
+ completed: 1,
25
+ total: 1,
26
+ creditsUsed: 1,
27
+ expiresAt: new Date(),
28
+ data: [
29
+ {
30
+ url: 'https://example.com',
31
+ title: 'Test Page',
32
+ description: 'Test Description',
33
+ markdown: '# Test Content',
34
+ actions: null,
35
+ },
36
+ ],
37
+ };
38
+ // Create mock instance methods
39
+ const mockSearch = jest.fn().mockImplementation(async () => mockSearchResponse);
40
+ const mockAsyncBatchScrapeUrls = jest
41
+ .fn()
42
+ .mockImplementation(async () => mockBatchScrapeResponse);
43
+ const mockCheckBatchScrapeStatus = jest
44
+ .fn()
45
+ .mockImplementation(async () => mockBatchStatusResponse);
46
+ // Create mock instance
47
+ const mockInstance = {
48
+ apiKey: 'test-api-key',
49
+ apiUrl: 'test-api-url',
50
+ search: mockSearch,
51
+ asyncBatchScrapeUrls: mockAsyncBatchScrapeUrls,
52
+ checkBatchScrapeStatus: mockCheckBatchScrapeStatus,
53
+ };
54
+ // Mock the module
55
+ jest.mock('@mendable/firecrawl-js', () => ({
56
+ __esModule: true,
57
+ default: jest.fn().mockImplementation(() => mockInstance),
58
+ }));