npm - firecrawl-mcp - Versions diffs - 1.3.0 - Mend

firecrawl-mcp 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2023 vrknetha
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,398 @@
+# FireCrawl MCP Server
+[![smithery badge](https://smithery.ai/badge/mcp-server-firecrawl)](https://smithery.ai/server/mcp-server-firecrawl)
+A Model Context Protocol (MCP) server implementation that integrates with FireCrawl for advanced web scraping capabilities.
+<a href="https://glama.ai/mcp/servers/57mideuljt"><img width="380" height="200" src="https://glama.ai/mcp/servers/57mideuljt/badge" alt="mcp-server-firecrawl MCP server" /></a>
+## Features
+- Web scraping with JavaScript rendering
+- Efficient batch processing with built-in rate limiting
+- URL discovery and crawling
+- Web search with content extraction
+- Automatic retries with exponential backoff
+- Credit usage monitoring for cloud API
+- Comprehensive logging system
+- Support for cloud and self-hosted FireCrawl instances
+- Mobile/Desktop viewport support
+- Smart content filtering with tag inclusion/exclusion
+## Installation
+### Installing via Smithery
+To install FireCrawl for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@mendableai/mcp-server-firecrawl):
+```bash
+npx -y @smithery/cli install @mendableai/mcp-server-firecrawl --client claude
+```
+### Manual Installation
+```bash
+npm install -g mcp-server-firecrawl
+```
+## Configuration
+### Environment Variables
+#### Required for Cloud API
+- `FIRECRAWL_API_KEY`: Your FireCrawl API key
+  - Required when using cloud API (default)
+  - Optional when using self-hosted instance with `FIRECRAWL_API_URL`
+- `FIRECRAWL_API_URL` (Optional): Custom API endpoint for self-hosted instances
+  - Example: `https://firecrawl.your-domain.com`
+  - If not provided, the cloud API will be used (requires API key)
+#### Optional Configuration
+##### Retry Configuration
+- `FIRECRAWL_RETRY_MAX_ATTEMPTS`: Maximum number of retry attempts (default: 3)
+- `FIRECRAWL_RETRY_INITIAL_DELAY`: Initial delay in milliseconds before first retry (default: 1000)
+- `FIRECRAWL_RETRY_MAX_DELAY`: Maximum delay in milliseconds between retries (default: 10000)
+- `FIRECRAWL_RETRY_BACKOFF_FACTOR`: Exponential backoff multiplier (default: 2)
+##### Credit Usage Monitoring
+- `FIRECRAWL_CREDIT_WARNING_THRESHOLD`: Credit usage warning threshold (default: 1000)
+- `FIRECRAWL_CREDIT_CRITICAL_THRESHOLD`: Credit usage critical threshold (default: 100)
+### Configuration Examples
+For cloud API usage with custom retry and credit monitoring:
+```bash
+# Required for cloud API
+export FIRECRAWL_API_KEY=your-api-key
+# Optional retry configuration
+export FIRECRAWL_RETRY_MAX_ATTEMPTS=5        # Increase max retry attempts
+export FIRECRAWL_RETRY_INITIAL_DELAY=2000    # Start with 2s delay
+export FIRECRAWL_RETRY_MAX_DELAY=30000       # Maximum 30s delay
+export FIRECRAWL_RETRY_BACKOFF_FACTOR=3      # More aggressive backoff
+# Optional credit monitoring
+export FIRECRAWL_CREDIT_WARNING_THRESHOLD=2000    # Warning at 2000 credits
+export FIRECRAWL_CREDIT_CRITICAL_THRESHOLD=500    # Critical at 500 credits
+```
+For self-hosted instance:
+```bash
+# Required for self-hosted
+export FIRECRAWL_API_URL=https://firecrawl.your-domain.com
+# Optional authentication for self-hosted
+export FIRECRAWL_API_KEY=your-api-key  # If your instance requires auth
+# Custom retry configuration
+export FIRECRAWL_RETRY_MAX_ATTEMPTS=10
+export FIRECRAWL_RETRY_INITIAL_DELAY=500     # Start with faster retries
+```
+### Usage with Claude Desktop
+Add this to your `claude_desktop_config.json`:
+```json
+{
+  "mcpServers": {
+    "mcp-server-firecrawl": {
+      "command": "npx",
+      "args": ["-y", "mcp-server-firecrawl"],
+      "env": {
+        "FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE",
+        "FIRECRAWL_RETRY_MAX_ATTEMPTS": "5",
+        "FIRECRAWL_RETRY_INITIAL_DELAY": "2000",
+        "FIRECRAWL_RETRY_MAX_DELAY": "30000",
+        "FIRECRAWL_RETRY_BACKOFF_FACTOR": "3",
+        "FIRECRAWL_CREDIT_WARNING_THRESHOLD": "2000",
+        "FIRECRAWL_CREDIT_CRITICAL_THRESHOLD": "500"
+      }
+    }
+  }
+}
+```
+### System Configuration
+The server includes several configurable parameters that can be set via environment variables. Here are the default values if not configured:
+```typescript
+const CONFIG = {
+  retry: {
+    maxAttempts: 3, // Number of retry attempts for rate-limited requests
+    initialDelay: 1000, // Initial delay before first retry (in milliseconds)
+    maxDelay: 10000, // Maximum delay between retries (in milliseconds)
+    backoffFactor: 2, // Multiplier for exponential backoff
+  },
+  credit: {
+    warningThreshold: 1000, // Warn when credit usage reaches this level
+    criticalThreshold: 100, // Critical alert when credit usage reaches this level
+  },
+};
+```
+These configurations control:
+1. **Retry Behavior**
+   - Automatically retries failed requests due to rate limits
+   - Uses exponential backoff to avoid overwhelming the API
+   - Example: With default settings, retries will be attempted at:
+     - 1st retry: 1 second delay
+     - 2nd retry: 2 seconds delay
+     - 3rd retry: 4 seconds delay (capped at maxDelay)
+2. **Credit Usage Monitoring**
+   - Tracks API credit consumption for cloud API usage
+   - Provides warnings at specified thresholds
+   - Helps prevent unexpected service interruption
+   - Example: With default settings:
+     - Warning at 1000 credits remaining
+     - Critical alert at 100 credits remaining
+### Rate Limiting and Batch Processing
+The server utilizes FireCrawl's built-in rate limiting and batch processing capabilities:
+- Automatic rate limit handling with exponential backoff
+- Efficient parallel processing for batch operations
+- Smart request queuing and throttling
+- Automatic retries for transient errors
+## Available Tools
+### 1. Scrape Tool (`firecrawl_scrape`)
+Scrape content from a single URL with advanced options.
+```json
+{
+  "name": "firecrawl_scrape",
+  "arguments": {
+    "url": "https://example.com",
+    "formats": ["markdown"],
+    "onlyMainContent": true,
+    "waitFor": 1000,
+    "timeout": 30000,
+    "mobile": false,
+    "includeTags": ["article", "main"],
+    "excludeTags": ["nav", "footer"],
+    "skipTlsVerification": false
+  }
+}
+```
+### 2. Batch Scrape Tool (`firecrawl_batch_scrape`)
+Scrape multiple URLs efficiently with built-in rate limiting and parallel processing.
+```json
+{
+  "name": "firecrawl_batch_scrape",
+  "arguments": {
+    "urls": ["https://example1.com", "https://example2.com"],
+    "options": {
+      "formats": ["markdown"],
+      "onlyMainContent": true
+    }
+  }
+}
+```
+Response includes operation ID for status checking:
+```json
+{
+  "content": [
+    {
+      "type": "text",
+      "text": "Batch operation queued with ID: batch_1. Use firecrawl_check_batch_status to check progress."
+    }
+  ],
+  "isError": false
+}
+```
+### 3. Check Batch Status (`firecrawl_check_batch_status`)
+Check the status of a batch operation.
+```json
+{
+  "name": "firecrawl_check_batch_status",
+  "arguments": {
+    "id": "batch_1"
+  }
+}
+```
+### 4. Search Tool (`firecrawl_search`)
+Search the web and optionally extract content from search results.
+```json
+{
+  "name": "firecrawl_search",
+  "arguments": {
+    "query": "your search query",
+    "limit": 5,
+    "lang": "en",
+    "country": "us",
+    "scrapeOptions": {
+      "formats": ["markdown"],
+      "onlyMainContent": true
+    }
+  }
+}
+```
+### 5. Crawl Tool (`firecrawl_crawl`)
+Start an asynchronous crawl with advanced options.
+```json
+{
+  "name": "firecrawl_crawl",
+  "arguments": {
+    "url": "https://example.com",
+    "maxDepth": 2,
+    "limit": 100,
+    "allowExternalLinks": false,
+    "deduplicateSimilarURLs": true
+  }
+}
+```
+### 6. Extract Tool (`firecrawl_extract`)
+Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.
+```json
+{
+  "name": "firecrawl_extract",
+  "arguments": {
+    "urls": ["https://example.com/page1", "https://example.com/page2"],
+    "prompt": "Extract product information including name, price, and description",
+    "systemPrompt": "You are a helpful assistant that extracts product information",
+    "schema": {
+      "type": "object",
+      "properties": {
+        "name": { "type": "string" },
+        "price": { "type": "number" },
+        "description": { "type": "string" }
+      },
+      "required": ["name", "price"]
+    },
+    "allowExternalLinks": false,
+    "enableWebSearch": false,
+    "includeSubdomains": false
+  }
+}
+```
+Example response:
+```json
+{
+  "content": [
+    {
+      "type": "text",
+      "text": {
+        "name": "Example Product",
+        "price": 99.99,
+        "description": "This is an example product description"
+      }
+    }
+  ],
+  "isError": false
+}
+```
+#### Extract Tool Options:
+- `urls`: Array of URLs to extract information from
+- `prompt`: Custom prompt for the LLM extraction
+- `systemPrompt`: System prompt to guide the LLM
+- `schema`: JSON schema for structured data extraction
+- `allowExternalLinks`: Allow extraction from external links
+- `enableWebSearch`: Enable web search for additional context
+- `includeSubdomains`: Include subdomains in extraction
+When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses FireCrawl's managed LLM service.
+## Logging System
+The server includes comprehensive logging:
+- Operation status and progress
+- Performance metrics
+- Credit usage monitoring
+- Rate limit tracking
+- Error conditions
+Example log messages:
+```
+[INFO] FireCrawl MCP Server initialized successfully
+[INFO] Starting scrape for URL: https://example.com
+[INFO] Batch operation queued with ID: batch_1
+[WARNING] Credit usage has reached warning threshold
+[ERROR] Rate limit exceeded, retrying in 2s...
+```
+## Error Handling
+The server provides robust error handling:
+- Automatic retries for transient errors
+- Rate limit handling with backoff
+- Detailed error messages
+- Credit usage warnings
+- Network resilience
+Example error response:
+```json
+{
+  "content": [
+    {
+      "type": "text",
+      "text": "Error: Rate limit exceeded. Retrying in 2 seconds..."
+    }
+  ],
+  "isError": true
+}
+```
+## Development
+```bash
+# Install dependencies
+npm install
+# Build
+npm run build
+# Run tests
+npm test
+```
+### Contributing
+1. Fork the repository
+2. Create your feature branch
+3. Run tests: `npm test`
+4. Submit a pull request
+## License
+MIT License - see LICENSE file for details

package/dist/jest.setup.js ADDED Viewed

@@ -0,0 +1,58 @@
+import { jest } from '@jest/globals';
+// Set test timeout
+jest.setTimeout(30000);
+// Create mock responses
+const mockSearchResponse = {
+    success: true,
+    data: [
+        {
+            url: 'https://example.com',
+            title: 'Test Page',
+            description: 'Test Description',
+            markdown: '# Test Content',
+            actions: null,
+        },
+    ],
+};
+const mockBatchScrapeResponse = {
+    success: true,
+    id: 'test-batch-id',
+};
+const mockBatchStatusResponse = {
+    success: true,
+    status: 'completed',
+    completed: 1,
+    total: 1,
+    creditsUsed: 1,
+    expiresAt: new Date(),
+    data: [
+        {
+            url: 'https://example.com',
+            title: 'Test Page',
+            description: 'Test Description',
+            markdown: '# Test Content',
+            actions: null,
+        },
+    ],
+};
+// Create mock instance methods
+const mockSearch = jest.fn().mockImplementation(async () => mockSearchResponse);
+const mockAsyncBatchScrapeUrls = jest
+    .fn()
+    .mockImplementation(async () => mockBatchScrapeResponse);
+const mockCheckBatchScrapeStatus = jest
+    .fn()
+    .mockImplementation(async () => mockBatchStatusResponse);
+// Create mock instance
+const mockInstance = {
+    apiKey: 'test-api-key',
+    apiUrl: 'test-api-url',
+    search: mockSearch,
+    asyncBatchScrapeUrls: mockAsyncBatchScrapeUrls,
+    checkBatchScrapeStatus: mockCheckBatchScrapeStatus,
+};
+// Mock the module
+jest.mock('@mendable/firecrawl-js', () => ({
+    __esModule: true,
+    default: jest.fn().mockImplementation(() => mockInstance),
+}));