crawlforge-mcp-server 3.0.11 → 3.0.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +103 -324
- package/package.json +2 -1
- package/server.js +332 -156
- package/src/core/AuthManager.js +22 -9
- package/src/core/ChangeTracker.js +1 -1
- package/src/core/ResearchOrchestrator.js +43 -5
- package/src/core/analysis/ContentAnalyzer.js +70 -17
- package/src/core/analysis/sentenceUtils.js +73 -0
- package/src/core/creatorMode.js +47 -0
- package/src/core/llm/LLMManager.js +120 -0
- package/src/core/processing/BrowserProcessor.js +1 -1
- package/src/tools/extract/extractStructured.js +280 -0
- package/src/tools/extract/summarizeContent.js +3 -2
- package/src/tools/search/ranking/ResultDeduplicator.js +21 -21
- package/src/tools/search/searchWeb.js +2 -1
package/CLAUDE.md
CHANGED
|
@@ -2,12 +2,67 @@
|
|
|
2
2
|
|
|
3
3
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
4
|
|
|
5
|
+
Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed.
|
|
6
|
+
|
|
7
|
+
Tradeoff: These guidelines bias toward caution over speed. For trivial tasks, use judgment.
|
|
8
|
+
|
|
9
|
+
1. Think Before Coding
|
|
10
|
+
Don't assume. Don't hide confusion. Surface tradeoffs.
|
|
11
|
+
|
|
12
|
+
Before implementing:
|
|
13
|
+
|
|
14
|
+
State your assumptions explicitly. If uncertain, ask.
|
|
15
|
+
If multiple interpretations exist, present them - don't pick silently.
|
|
16
|
+
If a simpler approach exists, say so. Push back when warranted.
|
|
17
|
+
If something is unclear, stop. Name what's confusing. Ask.
|
|
18
|
+
|
|
19
|
+
2. Simplicity First
|
|
20
|
+
Minimum code that solves the problem. Nothing speculative.
|
|
21
|
+
|
|
22
|
+
No features beyond what was asked.
|
|
23
|
+
No abstractions for single-use code.
|
|
24
|
+
No "flexibility" or "configurability" that wasn't requested.
|
|
25
|
+
No error handling for impossible scenarios.
|
|
26
|
+
If you write 200 lines and it could be 50, rewrite it.
|
|
27
|
+
Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
|
|
28
|
+
|
|
29
|
+
3. Surgical Changes
|
|
30
|
+
Touch only what you must. Clean up only your own mess.
|
|
31
|
+
|
|
32
|
+
When editing existing code:
|
|
33
|
+
|
|
34
|
+
Don't "improve" adjacent code, comments, or formatting.
|
|
35
|
+
Don't refactor things that aren't broken.
|
|
36
|
+
Match existing style, even if you'd do it differently.
|
|
37
|
+
If you notice unrelated dead code, mention it - don't delete it.
|
|
38
|
+
When your changes create orphans:
|
|
39
|
+
|
|
40
|
+
Remove imports/variables/functions that YOUR changes made unused.
|
|
41
|
+
Don't remove pre-existing dead code unless asked.
|
|
42
|
+
The test: Every changed line should trace directly to the user's request.
|
|
43
|
+
|
|
44
|
+
4. Goal-Driven Execution
|
|
45
|
+
Define success criteria. Loop until verified.
|
|
46
|
+
|
|
47
|
+
Transform tasks into verifiable goals:
|
|
48
|
+
|
|
49
|
+
"Add validation" → "Write tests for invalid inputs, then make them pass"
|
|
50
|
+
"Fix the bug" → "Write a test that reproduces it, then make it pass"
|
|
51
|
+
"Refactor X" → "Ensure tests pass before and after"
|
|
52
|
+
For multi-step tasks, state a brief plan:
|
|
53
|
+
|
|
54
|
+
1. [Step] → verify: [check]
|
|
55
|
+
2. [Step] → verify: [check]
|
|
56
|
+
3. [Step] → verify: [check]
|
|
57
|
+
Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
|
|
58
|
+
|
|
59
|
+
These guidelines are working if: fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.
|
|
60
|
+
|
|
5
61
|
## Project Overview
|
|
6
62
|
|
|
7
|
-
CrawlForge MCP Server - A professional MCP (Model Context Protocol) server
|
|
63
|
+
CrawlForge MCP Server - A professional MCP (Model Context Protocol) server providing 19 web scraping, crawling, and content processing tools.
|
|
8
64
|
|
|
9
|
-
**Current Version:** 3.0.
|
|
10
|
-
**Security Status:** Secure (authentication bypass vulnerability fixed in v3.0.3)
|
|
65
|
+
**Current Version:** 3.0.12
|
|
11
66
|
|
|
12
67
|
## Development Commands
|
|
13
68
|
|
|
@@ -28,6 +83,9 @@ export CRAWLFORGE_API_KEY="your_api_key_here"
|
|
|
28
83
|
# Run the server (production)
|
|
29
84
|
npm start
|
|
30
85
|
|
|
86
|
+
# HTTP transport mode
|
|
87
|
+
npm run start:http
|
|
88
|
+
|
|
31
89
|
# Development mode with verbose logging
|
|
32
90
|
npm run dev
|
|
33
91
|
|
|
@@ -35,39 +93,27 @@ npm run dev
|
|
|
35
93
|
npm test
|
|
36
94
|
|
|
37
95
|
# Functional tests
|
|
38
|
-
node test-tools.js # Test all tools
|
|
96
|
+
node test-tools.js # Test all tools
|
|
39
97
|
node test-real-world.js # Test real-world usage scenarios
|
|
40
98
|
|
|
41
99
|
# MCP Protocol tests
|
|
42
|
-
node tests/integration/mcp-protocol-compliance.test.js
|
|
100
|
+
node tests/integration/mcp-protocol-compliance.test.js
|
|
43
101
|
|
|
44
|
-
# Docker
|
|
102
|
+
# Docker
|
|
45
103
|
npm run docker:build # Build Docker image
|
|
46
104
|
npm run docker:dev # Run development container
|
|
47
105
|
npm run docker:prod # Run production container
|
|
48
|
-
npm run docker:test # Run test container
|
|
49
|
-
npm run docker:perf # Run performance test container
|
|
50
|
-
|
|
51
|
-
# Security Testing (CI/CD Integration)
|
|
52
|
-
npm run test:security # Run comprehensive security test suite
|
|
53
|
-
npm audit # Check for dependency vulnerabilities
|
|
54
|
-
npm audit fix # Automatically fix vulnerabilities
|
|
55
|
-
npm outdated # Check for outdated packages
|
|
56
|
-
|
|
57
|
-
# Release management
|
|
58
|
-
npm run release:patch # Patch version bump
|
|
59
|
-
npm run release:minor # Minor version bump
|
|
60
|
-
npm run release:major # Major version bump
|
|
61
|
-
|
|
62
|
-
# Cleanup
|
|
63
|
-
npm run clean # Remove cache, logs, test results
|
|
64
|
-
|
|
65
|
-
# Running specific test files
|
|
66
|
-
node tests/integration/mcp-protocol-compliance.test.js # MCP protocol compliance
|
|
67
|
-
node test-tools.js # All tools functional test
|
|
68
|
-
node test-real-world.js # Real-world scenarios test
|
|
69
106
|
```
|
|
70
107
|
|
|
108
|
+
### Debugging Tips
|
|
109
|
+
|
|
110
|
+
- Server logs via Winston logger (stderr for status, stdout for MCP protocol)
|
|
111
|
+
- Set `NODE_ENV=development` for verbose logging
|
|
112
|
+
- Use `--expose-gc` flag for memory profiling: `node --expose-gc server.js`
|
|
113
|
+
- Check `cache/` directory for cached responses
|
|
114
|
+
- Review `logs/` directory for application logs
|
|
115
|
+
- Memory monitoring auto-enabled in development mode (logs every 60s if >200MB)
|
|
116
|
+
|
|
71
117
|
## High-Level Architecture
|
|
72
118
|
|
|
73
119
|
### Core Infrastructure (`src/core/`)
|
|
@@ -76,12 +122,12 @@ node test-real-world.js # Real-world scenarios t
|
|
|
76
122
|
- **PerformanceManager**: Centralized performance monitoring and optimization
|
|
77
123
|
- **JobManager**: Asynchronous job tracking and management for batch operations
|
|
78
124
|
- **WebhookDispatcher**: Event notification system for job completion callbacks
|
|
79
|
-
- **ActionExecutor**: Browser automation engine
|
|
80
|
-
- **ResearchOrchestrator**:
|
|
81
|
-
- **StealthBrowserManager**:
|
|
82
|
-
- **LocalizationManager**:
|
|
83
|
-
- **ChangeTracker**:
|
|
84
|
-
- **SnapshotManager**:
|
|
125
|
+
- **ActionExecutor**: Browser automation engine (Playwright-based)
|
|
126
|
+
- **ResearchOrchestrator**: Multi-stage research with query expansion and synthesis
|
|
127
|
+
- **StealthBrowserManager**: Stealth mode scraping with anti-detection
|
|
128
|
+
- **LocalizationManager**: Multi-language content and localization
|
|
129
|
+
- **ChangeTracker**: Content change tracking over time
|
|
130
|
+
- **SnapshotManager**: Website snapshots and version history
|
|
85
131
|
|
|
86
132
|
### Tool Layer (`src/tools/`)
|
|
87
133
|
|
|
@@ -91,45 +137,26 @@ Tools are organized in subdirectories by category:
|
|
|
91
137
|
- `crawl/` - crawlDeep, mapSite
|
|
92
138
|
- `extract/` - analyzeContent, extractContent, processDocument, summarizeContent
|
|
93
139
|
- `research/` - deepResearch
|
|
94
|
-
- `search/` - searchWeb (
|
|
140
|
+
- `search/` - searchWeb (proxied through CrawlForge.dev API)
|
|
95
141
|
- `tracking/` - trackChanges
|
|
96
142
|
- `llmstxt/` - generateLLMsTxt
|
|
97
143
|
|
|
98
144
|
### Available MCP Tools (19 total)
|
|
99
145
|
|
|
100
146
|
**Basic Tools (server.js inline):**
|
|
101
|
-
|
|
102
|
-
- fetch_url, extract_text, extract_links, extract_metadata, scrape_structured
|
|
147
|
+
fetch_url, extract_text, extract_links, extract_metadata, scrape_structured
|
|
103
148
|
|
|
104
149
|
**Advanced Tools:**
|
|
105
|
-
|
|
106
|
-
- search_web, crawl_deep, map_site
|
|
107
|
-
- extract_content, process_document, summarize_content, analyze_content
|
|
108
|
-
- batch_scrape, scrape_with_actions, deep_research
|
|
109
|
-
- track_changes, generate_llms_txt, stealth_mode, localization
|
|
150
|
+
search_web, crawl_deep, map_site, extract_content, process_document, summarize_content, analyze_content, batch_scrape, scrape_with_actions, deep_research, track_changes, generate_llms_txt, stealth_mode, localization
|
|
110
151
|
|
|
111
152
|
### MCP Server Entry Point
|
|
112
153
|
|
|
113
154
|
The main server implementation is in `server.js` which:
|
|
114
155
|
|
|
115
|
-
1. **Secure Creator Mode
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
- Hash stored in code is safe to commit (one-way cryptographic hash)
|
|
120
|
-
|
|
121
|
-
2. **Authentication Flow**: Uses AuthManager for API key validation and credit tracking
|
|
122
|
-
- Checks for authentication on startup
|
|
123
|
-
- Auto-setup if CRAWLFORGE_API_KEY environment variable is present
|
|
124
|
-
- Creator mode bypasses credit checks for development/testing
|
|
125
|
-
|
|
126
|
-
3. **Tool Registration**: All tools registered via `server.registerTool()` pattern
|
|
127
|
-
- Wrapped with `withAuth()` function for credit tracking and authentication
|
|
128
|
-
- Each tool has inline Zod schema for parameter validation
|
|
129
|
-
- Response format uses `content` array with text objects
|
|
130
|
-
|
|
131
|
-
4. **Transport**: Uses stdio transport for MCP protocol communication
|
|
132
|
-
|
|
156
|
+
1. **Secure Creator Mode**: Loads `.env` early, validates secret via SHA256 hash comparison
|
|
157
|
+
2. **Authentication Flow**: AuthManager for API key validation and credit tracking
|
|
158
|
+
3. **Tool Registration**: All tools registered via `server.registerTool()`, wrapped with `withAuth()` for credit tracking
|
|
159
|
+
4. **Transport**: stdio transport for MCP protocol communication
|
|
133
160
|
5. **Graceful Shutdown**: Cleans up browser instances, job managers, and other resources
|
|
134
161
|
|
|
135
162
|
### Tool Credit System
|
|
@@ -139,7 +166,6 @@ Each tool wrapped with `withAuth(toolName, handler)`:
|
|
|
139
166
|
- Checks credits before execution (skipped in creator mode)
|
|
140
167
|
- Reports usage with credit deduction on success
|
|
141
168
|
- Charges half credits on error
|
|
142
|
-
- Returns credit error if insufficient balance
|
|
143
169
|
- Creator mode: Unlimited access for package maintainer
|
|
144
170
|
|
|
145
171
|
### Key Configuration
|
|
@@ -147,21 +173,11 @@ Each tool wrapped with `withAuth(toolName, handler)`:
|
|
|
147
173
|
Critical environment variables defined in `src/constants/config.js`:
|
|
148
174
|
|
|
149
175
|
```bash
|
|
150
|
-
# Authentication (required for users)
|
|
151
176
|
CRAWLFORGE_API_KEY=your_api_key_here
|
|
152
|
-
|
|
153
|
-
# Creator Mode (maintainer only - KEEP SECRET!)
|
|
154
|
-
# CRAWLFORGE_CREATOR_SECRET=your-uuid-secret
|
|
155
|
-
# Enables unlimited access for development/testing
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
# Performance Settings
|
|
159
177
|
MAX_WORKERS=10
|
|
160
178
|
QUEUE_CONCURRENCY=10
|
|
161
179
|
CACHE_TTL=3600000
|
|
162
180
|
RATE_LIMIT_REQUESTS_PER_SECOND=10
|
|
163
|
-
|
|
164
|
-
# Crawling Limits
|
|
165
181
|
MAX_CRAWL_DEPTH=5
|
|
166
182
|
MAX_PAGES_PER_CRAWL=100
|
|
167
183
|
RESPECT_ROBOTS_TXT=true
|
|
@@ -173,44 +189,6 @@ RESPECT_ROBOTS_TXT=true
|
|
|
173
189
|
- `.env` - Environment variables for development
|
|
174
190
|
- `src/constants/config.js` - Central configuration with defaults and validation
|
|
175
191
|
|
|
176
|
-
## Common Development Tasks
|
|
177
|
-
|
|
178
|
-
### Running a Single Test
|
|
179
|
-
|
|
180
|
-
```bash
|
|
181
|
-
# Run a specific test file
|
|
182
|
-
node tests/unit/linkAnalyzer.test.js
|
|
183
|
-
|
|
184
|
-
# Run a specific Wave test
|
|
185
|
-
node tests/validation/test-batch-scrape.js
|
|
186
|
-
|
|
187
|
-
# Run Wave 3 tests with verbose output
|
|
188
|
-
npm run test:wave3:verbose
|
|
189
|
-
```
|
|
190
|
-
|
|
191
|
-
### Testing Tool Integration
|
|
192
|
-
|
|
193
|
-
```bash
|
|
194
|
-
# Test MCP protocol compliance
|
|
195
|
-
npm test
|
|
196
|
-
|
|
197
|
-
# Test specific tool functionality
|
|
198
|
-
node tests/validation/test-batch-scrape.js
|
|
199
|
-
node tests/validation/test-scrape-with-actions.js
|
|
200
|
-
|
|
201
|
-
# Test research features
|
|
202
|
-
node tests/validation/wave3-validation.js
|
|
203
|
-
```
|
|
204
|
-
|
|
205
|
-
### Debugging Tips
|
|
206
|
-
|
|
207
|
-
- Server logs are written to console via Winston logger (stderr for status, stdout for MCP protocol)
|
|
208
|
-
- Set `NODE_ENV=development` for verbose logging
|
|
209
|
-
- Use `--expose-gc` flag for memory profiling: `node --expose-gc server.js`
|
|
210
|
-
- Check `cache/` directory for cached responses
|
|
211
|
-
- Review `logs/` directory for application logs
|
|
212
|
-
- Memory monitoring automatically enabled in development mode (logs every 60s if >200MB)
|
|
213
|
-
|
|
214
192
|
### Adding New Tools
|
|
215
193
|
|
|
216
194
|
When adding a new tool to server.js:
|
|
@@ -220,225 +198,35 @@ When adding a new tool to server.js:
|
|
|
220
198
|
3. Register with `server.registerTool(name, { description, inputSchema }, withAuth(name, handler))`
|
|
221
199
|
4. Ensure tool implements `execute(params)` method
|
|
222
200
|
5. Add to cleanup array in gracefulShutdown if it has `destroy()` or `cleanup()` methods
|
|
223
|
-
6. Update tool count in console log at server startup
|
|
224
|
-
|
|
225
|
-
## CI/CD Security Integration
|
|
226
|
-
|
|
227
|
-
### Automated Security Testing Pipeline
|
|
228
|
-
|
|
229
|
-
The project includes comprehensive security testing integrated into the CI/CD pipeline:
|
|
230
|
-
|
|
231
|
-
#### Main CI Pipeline (`.github/workflows/ci.yml`)
|
|
232
|
-
|
|
233
|
-
The CI pipeline runs on every PR and push to main/develop branches and includes:
|
|
234
|
-
|
|
235
|
-
**Security Test Suite:**
|
|
236
|
-
|
|
237
|
-
- SSRF Protection validation
|
|
238
|
-
- Input validation (XSS, SQL injection, command injection)
|
|
239
|
-
- Rate limiting functionality
|
|
240
|
-
- DoS protection measures
|
|
241
|
-
- Regex DoS vulnerability detection
|
|
201
|
+
6. Update tool count in console log at server startup
|
|
242
202
|
|
|
243
|
-
|
|
203
|
+
## Security
|
|
244
204
|
|
|
245
|
-
|
|
246
|
-
- Vulnerability severity analysis (critical/high/moderate/low)
|
|
247
|
-
- License compliance checking
|
|
248
|
-
- Outdated package detection
|
|
205
|
+
Security testing and CI/CD pipeline details are in:
|
|
249
206
|
|
|
250
|
-
|
|
207
|
+
- `docs/security-audit-report.md` — Full security audit
|
|
208
|
+
- `.github/workflows/ci.yml` — CI pipeline with security checks
|
|
209
|
+
- `.github/workflows/security.yml` — Daily scheduled security scanning
|
|
210
|
+
- `.github/SECURITY.md` — Security policy and procedures
|
|
251
211
|
|
|
252
|
-
|
|
253
|
-
- ESLint security rules for dangerous patterns
|
|
254
|
-
- Hardcoded secret detection
|
|
255
|
-
- Security file scanning
|
|
212
|
+
Run `npm audit` locally to check dependencies.
|
|
256
213
|
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
- Comprehensive security reports generated
|
|
260
|
-
- PR comments with security summaries
|
|
261
|
-
- Artifact upload for detailed analysis
|
|
262
|
-
- Build failure on critical vulnerabilities
|
|
263
|
-
|
|
264
|
-
#### Dedicated Security Workflow (`.github/workflows/security.yml`)
|
|
265
|
-
|
|
266
|
-
Daily scheduled comprehensive security scanning:
|
|
267
|
-
|
|
268
|
-
**Dependency Security Scan:**
|
|
269
|
-
|
|
270
|
-
- Full vulnerability audit with configurable severity levels
|
|
271
|
-
- License compliance verification
|
|
272
|
-
- Detailed vulnerability reporting
|
|
273
|
-
|
|
274
|
-
**Static Code Analysis:**
|
|
275
|
-
|
|
276
|
-
- Extended CodeQL analysis with security-focused queries
|
|
277
|
-
- ESLint security plugin integration
|
|
278
|
-
- Pattern-based secret detection
|
|
279
|
-
|
|
280
|
-
**Container Security:**
|
|
281
|
-
|
|
282
|
-
- Trivy vulnerability scanning
|
|
283
|
-
- SARIF report generation
|
|
284
|
-
- Container base image analysis
|
|
285
|
-
|
|
286
|
-
**Automated Issue Creation:**
|
|
287
|
-
|
|
288
|
-
- GitHub issues created for critical vulnerabilities
|
|
289
|
-
- Detailed security reports with remediation steps
|
|
290
|
-
- Configurable severity thresholds
|
|
291
|
-
|
|
292
|
-
### Security Thresholds and Policies
|
|
293
|
-
|
|
294
|
-
**Build Failure Conditions:**
|
|
295
|
-
|
|
296
|
-
- Any critical severity vulnerabilities
|
|
297
|
-
- More than 3 high severity vulnerabilities
|
|
298
|
-
- Security test suite failures
|
|
299
|
-
|
|
300
|
-
**Automated Actions:**
|
|
301
|
-
|
|
302
|
-
- Daily security scans at 2 AM UTC
|
|
303
|
-
- PR blocking for security failures
|
|
304
|
-
- Automatic security issue creation
|
|
305
|
-
- Comprehensive artifact collection
|
|
306
|
-
|
|
307
|
-
### Running Security Tests Locally
|
|
308
|
-
|
|
309
|
-
```bash
|
|
310
|
-
# Run the complete security test suite
|
|
311
|
-
npm run test:security
|
|
312
|
-
|
|
313
|
-
# Check for dependency vulnerabilities
|
|
314
|
-
npm audit --audit-level moderate
|
|
315
|
-
|
|
316
|
-
# Fix automatically resolvable vulnerabilities
|
|
317
|
-
npm audit fix
|
|
318
|
-
|
|
319
|
-
# Generate security report manually
|
|
320
|
-
mkdir security-results
|
|
321
|
-
npm audit --json > security-results/audit.json
|
|
322
|
-
|
|
323
|
-
# Run specific security validation
|
|
324
|
-
node tests/security/security-test-suite.js
|
|
325
|
-
```
|
|
326
|
-
|
|
327
|
-
### Security Artifacts and Reports
|
|
328
|
-
|
|
329
|
-
**Generated Reports:**
|
|
330
|
-
|
|
331
|
-
- `SECURITY-REPORT.md`: Comprehensive security assessment
|
|
332
|
-
- `npm-audit.json`: Detailed vulnerability data
|
|
333
|
-
- `security-tests.log`: Test execution logs
|
|
334
|
-
- `dependency-analysis.md`: Package security analysis
|
|
335
|
-
- `license-check.md`: License compliance report
|
|
336
|
-
|
|
337
|
-
**Artifact Retention:**
|
|
338
|
-
|
|
339
|
-
- CI security results: 30 days
|
|
340
|
-
- Comprehensive security reports: 90 days
|
|
341
|
-
- Critical vulnerability reports: Indefinite
|
|
342
|
-
|
|
343
|
-
### Manual Security Scan Triggers
|
|
344
|
-
|
|
345
|
-
The security workflow can be manually triggered with custom parameters:
|
|
346
|
-
|
|
347
|
-
```bash
|
|
348
|
-
# Via GitHub CLI
|
|
349
|
-
gh workflow run security.yml \
|
|
350
|
-
--field scan_type=all \
|
|
351
|
-
--field severity_threshold=moderate
|
|
352
|
-
|
|
353
|
-
# Via GitHub UI
|
|
354
|
-
# Go to Actions > Security Scanning > Run workflow
|
|
355
|
-
```
|
|
356
|
-
|
|
357
|
-
**Available Options:**
|
|
358
|
-
|
|
359
|
-
- `scan_type`: all, dependencies, code-analysis, container-scan
|
|
360
|
-
- `severity_threshold`: low, moderate, high, critical
|
|
361
|
-
|
|
362
|
-
### Security Integration Best Practices
|
|
363
|
-
|
|
364
|
-
**For Contributors:**
|
|
365
|
-
|
|
366
|
-
1. Always run `npm run test:security` before submitting PRs
|
|
367
|
-
2. Address any security warnings in your code
|
|
368
|
-
3. Keep dependencies updated with `npm audit fix`
|
|
369
|
-
4. Review security artifacts when CI fails
|
|
370
|
-
|
|
371
|
-
**For Maintainers:**
|
|
372
|
-
|
|
373
|
-
1. Review security reports weekly
|
|
374
|
-
2. Respond to automated security issues promptly
|
|
375
|
-
3. Keep security thresholds updated
|
|
376
|
-
4. Monitor trending vulnerabilities in dependencies
|
|
377
|
-
|
|
378
|
-
### Security Documentation
|
|
379
|
-
|
|
380
|
-
Comprehensive security documentation is available in:
|
|
381
|
-
|
|
382
|
-
- `.github/SECURITY.md` - Complete security policy and procedures
|
|
383
|
-
- Security workflow logs and artifacts
|
|
384
|
-
- Generated security reports in CI runs
|
|
385
|
-
|
|
386
|
-
The security integration ensures that:
|
|
387
|
-
|
|
388
|
-
- No critical vulnerabilities reach production
|
|
389
|
-
- Security issues are detected early in development
|
|
390
|
-
- Comprehensive audit trails are maintained
|
|
391
|
-
- Automated remediation guidance is provided
|
|
392
|
-
|
|
393
|
-
## Important Implementation Patterns
|
|
214
|
+
## Implementation Patterns
|
|
394
215
|
|
|
395
216
|
### Tool Structure
|
|
396
217
|
|
|
397
|
-
All tools follow a consistent class-based pattern:
|
|
398
|
-
|
|
399
218
|
```javascript
|
|
400
219
|
export class ToolName {
|
|
401
|
-
constructor(config) {
|
|
402
|
-
this.config = config;
|
|
403
|
-
// Initialize resources
|
|
404
|
-
}
|
|
220
|
+
constructor(config) { this.config = config; }
|
|
405
221
|
|
|
406
222
|
async execute(params) {
|
|
407
|
-
// Validate params (Zod validation done in server.js)
|
|
408
|
-
// Execute tool logic
|
|
409
|
-
// Return structured result
|
|
410
223
|
return { success: true, data: {...} };
|
|
411
224
|
}
|
|
412
225
|
|
|
413
|
-
async destroy() {
|
|
414
|
-
// Cleanup resources (browsers, connections, etc.)
|
|
415
|
-
}
|
|
226
|
+
async destroy() { /* cleanup resources */ }
|
|
416
227
|
}
|
|
417
228
|
```
|
|
418
229
|
|
|
419
|
-
### Search Provider Architecture
|
|
420
|
-
|
|
421
|
-
All search requests are proxied through the CrawlForge.dev API:
|
|
422
|
-
|
|
423
|
-
- `crawlforgeSearch.js` - Proxies through CrawlForge.dev API (Google Search backend)
|
|
424
|
-
- No Google API credentials needed from users
|
|
425
|
-
- Users only need their CrawlForge API key
|
|
426
|
-
- Credit cost: 2 credits per search
|
|
427
|
-
|
|
428
|
-
Factory in `src/tools/search/adapters/searchProviderFactory.js`
|
|
429
|
-
|
|
430
|
-
### Browser Management
|
|
431
|
-
- Context isolation per operation for security
|
|
432
|
-
|
|
433
|
-
### Memory Management
|
|
434
|
-
|
|
435
|
-
Critical for long-running processes:
|
|
436
|
-
|
|
437
|
-
- Graceful shutdown handlers registered for SIGINT/SIGTERM
|
|
438
|
-
- All tools with heavy resources must implement `destroy()` or `cleanup()`
|
|
439
|
-
- Memory monitoring in development mode (server.js line 1955-1963)
|
|
440
|
-
- Force GC on shutdown if available
|
|
441
|
-
|
|
442
230
|
### Error Handling Pattern
|
|
443
231
|
|
|
444
232
|
```javascript
|
|
@@ -453,19 +241,10 @@ try {
|
|
|
453
241
|
}
|
|
454
242
|
```
|
|
455
243
|
|
|
456
|
-
|
|
457
|
-
|
|
458
|
-
- All config in `src/constants/config.js` with defaults
|
|
459
|
-
- `validateConfig()` checks required settings
|
|
460
|
-
- Environment variables parsed with fallbacks
|
|
461
|
-
- Config errors only fail in production (warnings in dev)
|
|
462
|
-
|
|
463
|
-
## 🎯 Project Management Rules
|
|
464
|
-
|
|
465
|
-
## 🎯 Project Management Rules
|
|
244
|
+
## Project Management Rules
|
|
466
245
|
|
|
467
|
-
-
|
|
468
|
-
-
|
|
469
|
-
-
|
|
470
|
-
-
|
|
471
|
-
-
|
|
246
|
+
- Always have the project manager work with the appropriate sub agents in parallel
|
|
247
|
+
- Each sub agent must work on their strengths; when done they report to the project manager who updates `docs/PRODUCTION_READINESS.md`
|
|
248
|
+
- Whenever a phase is completed, push all changes to GitHub
|
|
249
|
+
- Put all documentation md files into the `docs/` folder
|
|
250
|
+
- Every time you finish a phase run `npm run build` and fix all errors before pushing
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "crawlforge-mcp-server",
|
|
3
|
-
"version": "3.0.
|
|
3
|
+
"version": "3.0.13",
|
|
4
4
|
"description": "CrawlForge MCP Server - Professional Model Context Protocol server with 19 comprehensive web scraping, crawling, and content processing tools.",
|
|
5
5
|
"main": "server.js",
|
|
6
6
|
"bin": {
|
|
@@ -9,6 +9,7 @@
|
|
|
9
9
|
},
|
|
10
10
|
"scripts": {
|
|
11
11
|
"start": "node server.js",
|
|
12
|
+
"start:http": "node server.js --http",
|
|
12
13
|
"setup": "node setup.js",
|
|
13
14
|
"dev": "cross-env NODE_ENV=development node server.js",
|
|
14
15
|
"test": "node tests/integration/mcp-protocol-compliance.test.js",
|