npm - crawlforge-mcp-server - Versions diffs - 3.0.0 → 3.0.1 - Mend

crawlforge-mcp-server 3.0.0 → 3.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/CLAUDE.md +143 -46
package/package.json +4 -1

package/CLAUDE.md CHANGED Viewed

@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 ## Project Overview
-CrawlForge MCP Server - A professional MCP (Model Context Protocol) server implementation providing 16 comprehensive web scraping, crawling, and content processing tools. Version 3.0 includes advanced content extraction, document processing, summarization, and analysis capabilities. Wave 2 adds asynchronous batch processing and browser automation features. Wave 3 introduces deep research orchestration, stealth scraping, localization, and change tracking.
+CrawlForge MCP Server - A professional MCP (Model Context Protocol) server implementation providing 18+ comprehensive web scraping, crawling, and content processing tools. Version 3.0 includes advanced content extraction, document processing, summarization, and analysis capabilities. Wave 2 adds asynchronous batch processing and browser automation features. Wave 3 introduces deep research orchestration, stealth scraping, localization, and change tracking.
 ## Development Commands
@@ -12,42 +12,30 @@ CrawlForge MCP Server - A professional MCP (Model Context Protocol) server imple
 # Install dependencies
 npm install
-# Copy and configure environment
-cp .env.example .env
-# Edit .env to add Google API credentials if using Google search
+# Setup (required for first run unless in creator mode)
+npm run setup
+# Or provide API key via environment:
+export CRAWLFORGE_API_KEY="your_api_key_here"
-# Run the server
+# Creator Mode (bypass API key requirement for development)
+export BYPASS_API_KEY=true
 npm start
+# Run the server (production)
+npm start
+# Development mode with verbose logging
+npm run dev
 # Test MCP protocol compliance
-# NOTE: test-server.js doesn't exist, use integration tests instead
-npm run test:integration      # Integration tests including MCP compliance
-# Lint checks (no linter configured yet, placeholder)
-npm run lint
-# Performance tests
-npm run test:performance       # Full performance test suite
-npm run test:performance:quick # Quick performance tests
-npm run test:load             # Load testing
-npm run test:memory           # Memory usage tests
-npm run test:benchmark        # Component benchmarks
-npm run test:integration      # Integration tests
-npm run test:security         # Security test suite
-npm run test:all             # Run all tests
-# Wave 2 Validation Tests
-node tests/validation/test-wave2-runner.js  # Test Wave 2 features
-node tests/validation/test-batch-scrape.js  # Test batch scraping
-node tests/validation/test-scrape-with-actions.js  # Test action scraping
-node tests/integration/master-test-runner.js # Run master test suite
-# Wave 3 Tests
-npm run test:wave3             # Full Wave 3 validation
-npm run test:wave3:quick       # Quick Wave 3 tests
-npm run test:wave3:verbose     # Verbose Wave 3 output
-npm run test:unit:wave3        # Wave 3 unit tests (Jest)
-npm run test:integration:wave3 # Wave 3 integration tests
+npm test
+# Functional tests
+node test-tools.js             # Test all tools (basic, Wave 2, Wave 3)
+node test-real-world.js        # Test real-world usage scenarios
+# MCP Protocol tests
+node tests/integration/mcp-protocol-compliance.test.js  # MCP protocol compliance
 # Docker commands
 npm run docker:build         # Build Docker image
@@ -71,18 +59,19 @@ npm run release:major       # Major version bump
 npm run clean              # Remove cache, logs, test results
 # Running specific test files
-node tests/unit/linkAnalyzer.test.js          # Unit test for link analyzer
-node tests/validation/wave3-validation.js     # Wave 3 validation suite
-node tests/security/security-test-suite.js    # Security test suite
+node tests/integration/mcp-protocol-compliance.test.js  # MCP protocol compliance
+node test-tools.js                                      # All tools functional test
+node test-real-world.js                                 # Real-world scenarios test
 ```
 ## High-Level Architecture
 ### Core Infrastructure (`src/core/`)
+- **AuthManager**: Authentication, credit tracking, and usage reporting (supports creator mode)
 - **PerformanceManager**: Centralized performance monitoring and optimization
 - **JobManager**: Asynchronous job tracking and management for batch operations
 - **WebhookDispatcher**: Event notification system for job completion callbacks
-- **ActionExecutor**: Browser automation engine for complex interactions
+- **ActionExecutor**: Browser automation engine for complex interactions (Playwright-based)
 - **ResearchOrchestrator**: Coordinates multi-stage research with query expansion and synthesis
 - **StealthBrowserManager**: Manages stealth mode scraping with anti-detection features
 - **LocalizationManager**: Handles multi-language content and localization
@@ -96,21 +85,50 @@ Tools are organized in subdirectories by category:
 - `extract/` - analyzeContent, extractContent, processDocument, summarizeContent
 - `research/` - deepResearch
 - `search/` - searchWeb and provider adapters (Google, DuckDuckGo)
-- `tracking/` - trackChanges
+- `tracking/` - trackChanges (currently disabled in server.js)
+- `llmstxt/` - generateLLMsTxt
+### Available MCP Tools (18 total)
+**Basic Tools (server.js inline):**
+- fetch_url, extract_text, extract_links, extract_metadata, scrape_structured
+**Advanced Tools:**
+- search_web (conditional - requires search provider), crawl_deep, map_site
+- extract_content, process_document, summarize_content, analyze_content
+- batch_scrape, scrape_with_actions, deep_research
+- generate_llms_txt, stealth_mode, localization
+**Note:** track_changes tool is implemented but currently commented out in server.js (line 1409-1535)
 ### MCP Server Entry Point
 The main server implementation is in `server.js` which:
-1. Uses stdio transport for MCP protocol communication
-2. Registers all 16 tools using `server.registerTool()` pattern
-3. Each tool has inline Zod schema for parameter validation
-4. Parameter extraction from `request.params?.arguments` structure
-5. Response format uses `content` array with text objects
+1. **Authentication Flow**: Uses AuthManager for API key validation and credit tracking
+   - Checks for authentication on startup (skipped in creator mode)
+   - Auto-setup if CRAWLFORGE_API_KEY environment variable is present
+   - Creator mode enabled via BYPASS_API_KEY=true
+2. **Tool Registration**: All tools registered via `server.registerTool()` pattern
+   - Wrapped with `withAuth()` function for credit tracking and authentication
+   - Each tool has inline Zod schema for parameter validation
+   - Response format uses `content` array with text objects
+3. **Transport**: Uses stdio transport for MCP protocol communication
+4. **Graceful Shutdown**: Cleans up browser instances, job managers, and other resources
+### Tool Credit System
+Each tool wrapped with `withAuth(toolName, handler)`:
+- Checks credits before execution (skipped in creator mode)
+- Reports usage with credit deduction on success
+- Charges half credits on error
+- Returns credit error if insufficient balance
 ### Key Configuration
-Critical environment variables:
+Critical environment variables defined in `src/constants/config.js`:
 ```bash
+# Authentication (required unless in creator mode)
+CRAWLFORGE_API_KEY=your_api_key_here
+BYPASS_API_KEY=true  # Enable creator mode for development
 # Search Provider (auto, google, duckduckgo)
 SEARCH_PROVIDER=auto
@@ -130,6 +148,11 @@ MAX_PAGES_PER_CRAWL=100
 RESPECT_ROBOTS_TXT=true
 ```
+### Configuration Files
+- `~/.crawlforge/config.json` - User authentication and API key storage
+- `.env` - Environment variables for development
+- `src/constants/config.js` - Central configuration with defaults and validation
 ## Common Development Tasks
 ### Running a Single Test
@@ -158,11 +181,22 @@ node tests/validation/wave3-validation.js
 ```
 ### Debugging Tips
-- Server logs are written to console via Winston logger
+- Server logs are written to console via Winston logger (stderr for status, stdout for MCP protocol)
 - Set `NODE_ENV=development` for verbose logging
-- Use `--expose-gc` flag for memory profiling tests
+- Use `--expose-gc` flag for memory profiling: `node --expose-gc server.js`
 - Check `cache/` directory for cached responses
 - Review `logs/` directory for application logs
+- Use creator mode during development to bypass authentication: `BYPASS_API_KEY=true npm start`
+- Memory monitoring automatically enabled in development mode (logs every 60s if >200MB)
+### Adding New Tools
+When adding a new tool to server.js:
+1. Import the tool class from `src/tools/`
+2. Instantiate the tool (with config if needed)
+3. Register with `server.registerTool(name, { description, inputSchema }, withAuth(name, handler))`
+4. Ensure tool implements `execute(params)` method
+5. Add to cleanup array in gracefulShutdown if it has `destroy()` or `cleanup()` methods
+6. Update tool count in console log at server startup (line 1860)
 ## CI/CD Security Integration
@@ -313,3 +347,66 @@ The security integration ensures that:
 - Comprehensive audit trails are maintained
 - Automated remediation guidance is provided
+## Important Implementation Patterns
+### Tool Structure
+All tools follow a consistent class-based pattern:
+```javascript
+export class ToolName {
+  constructor(config) {
+    this.config = config;
+    // Initialize resources
+  }
+  async execute(params) {
+    // Validate params (Zod validation done in server.js)
+    // Execute tool logic
+    // Return structured result
+    return { success: true, data: {...} };
+  }
+  async destroy() {
+    // Cleanup resources (browsers, connections, etc.)
+  }
+}
+```
+### Search Provider Architecture
+Search providers implement a factory pattern:
+- `searchProviderFactory.js` selects provider based on config
+- Providers implement common interface: `search(query, options)`
+- Auto-fallback: Google → DuckDuckGo if Google credentials missing
+- Each provider in `src/tools/search/adapters/`
+### Browser Management
+- Playwright used for browser automation (ActionExecutor, ScrapeWithActionsTool)
+- Stealth features in StealthBrowserManager
+- Always cleanup browsers in error handlers
+- Context isolation per operation for security
+### Memory Management
+Critical for long-running processes:
+- Graceful shutdown handlers registered for SIGINT/SIGTERM
+- All tools with heavy resources must implement `destroy()` or `cleanup()`
+- Memory monitoring in development mode (server.js line 1955-1963)
+- Force GC on shutdown if available
+### Error Handling Pattern
+```javascript
+try {
+  const result = await tool.execute(params);
+  return { content: [{ type: "text", text: JSON.stringify(result, null, 2) }] };
+} catch (error) {
+  return {
+    content: [{ type: "text", text: `Operation failed: ${error.message}` }],
+    isError: true
+  };
+}
+```
+### Configuration Validation
+- All config in `src/constants/config.js` with defaults
+- `validateConfig()` checks required settings
+- Environment variables parsed with fallbacks
+- Config errors only fail in production (warnings in dev)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "crawlforge-mcp-server",
-  "version": "3.0.0",
+  "version": "3.0.1",
   "description": "CrawlForge MCP Server - Professional Model Context Protocol server with 16+ comprehensive web scraping, crawling, and content processing tools.",
   "main": "server.js",
   "bin": {
@@ -12,6 +12,9 @@
     "setup": "node setup.js",
     "dev": "cross-env NODE_ENV=development node server.js",
     "test": "node tests/integration/mcp-protocol-compliance.test.js",
+    "test:tools": "node test-tools.js",
+    "test:real-world": "node test-real-world.js",
+    "test:all": "bash run-all-tests.sh",
     "postinstall": "echo '\n🎉 CrawlForge MCP Server installed!\n\nRun \"npx crawlforge-setup\" to configure your API key and get started.\n'",
     "docker:build": "docker build -t crawlforge .",
     "docker:dev": "docker-compose up crawlforge-dev",