npm - crawlforge-mcp-server - Versions diffs - 4.6.6 → 4.7.1 - Mend

crawlforge-mcp-server 4.6.6 → 4.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md +29 -34
package/package.json +1 -1
package/server.js +8 -9
package/src/core/AuthManager.js +44 -50
package/src/core/LLMsTxtAnalyzer.js +10 -1
package/src/core/ResearchOrchestrator.js +5 -3
package/src/server/withAuth.js +5 -9
package/src/tools/llmstxt/generateLLMsTxt.js +3 -1
package/src/tools/research/deepResearch.js +4 -1
package/src/tools/search/searchWeb.js +5 -4

package/README.md CHANGED Viewed

@@ -67,9 +67,9 @@
 npm install -g crawlforge-mcp-server
 ```
-### 2. Setup Your API Key (optional for the free local tools)
+### 2. Setup Your API Key (required)
-The 15 free local tools work immediately with **no API key at all** — skip straight to step 3 if that's all you need. To unlock the metered premium tools (`search_web`, `crawl_deep`, `stealth_mode`, `agent`, …):
+Every tool requires a CrawlForge API key — new accounts get 1,000 free trial credits to start:
 ```bash
 npx crawlforge-setup
@@ -150,43 +150,38 @@ Restart Cursor to activate.
 ## 📊 Available Tools
-CrawlForge is **open-core**: 15 tools run locally on your machine and are **completely free — no API key required**. The metered premium tools cover real infrastructure (search fees, proxies, browser farms) and need an API key.
-**Free Local Tools** (0 credits, no API key needed)
-| Tool | What it does |
-|------|--------------|
-| `fetch_url` | Fetch content from any URL |
-| `extract_text` | Extract clean text from web pages |
-| `extract_links` | Get all links from a page |
-| `extract_metadata` | Extract page metadata (title, OG tags, schema.org) |
-| `scrape` | **Unified single-fetch, multi-format extraction.** Pass a `formats` array (markdown/html/rawHtml/text/links/metadata/screenshot/json-schema) plus `onlyMainContent`; one fetch serves every requested format with per-format partial-success warnings. *The `screenshot` format is the one metered exception (2 credits — needs a server browser)* |
-| `scrape_structured` | Extract structured data with CSS selectors |
-| `scrape_template` | Structured data from well-known sites (Amazon, GitHub, LinkedIn, YouTube, Reddit, Hacker News, npm, and more) without writing selectors |
-| `extract_content` | Enhanced content extraction |
-| `summarize_content` | Generate intelligent summaries |
-| `analyze_content` | Comprehensive content analysis |
-| `extract_structured` | LLM-powered schema-driven extraction (your own LLM key or local Ollama) |
-| `extract_with_llm` | Natural-language extraction. **Defaults to a local Ollama model — no API key, no API costs.** Pass `provider: "openai" \| "anthropic"` with the matching key for cloud models |
-| `process_document` | Multi-format document processing |
-| `list_ollama_models` | List the Ollama models installed locally (helps you pick a `model` for `extract_with_llm`) |
-| `get_batch_results` | Retrieve paginated results for a `batch_scrape` job by `batchId` |
-**Metered Premium Tools** (3–10 credits, API key required)
+CrawlForge requires a CrawlForge API key — **every tool is metered and consumes credits**. New accounts get **1,000 free trial credits** to start. Get a key at [crawlforge.dev/signup](https://www.crawlforge.dev/signup).
+**All Tools** (API key required)
 | Tool | Credits | What it does |
 |------|---------|--------------|
-| `map_site` | 3 | Discover and map website structure (optional `search=` ranks the discovered URLs) |
+| `fetch_url` | 1 | Fetch content from any URL |
+| `extract_text` | 1 | Extract clean text from web pages |
+| `extract_links` | 1 | Get all links from a page |
+| `extract_metadata` | 1 | Extract page metadata (title, OG tags, schema.org) |
+| `scrape_template` | 1 | Structured data from well-known sites (Amazon, GitHub, LinkedIn, YouTube, Reddit, Hacker News, npm, and more) without writing selectors |
+| `list_ollama_models` | 1 | List the Ollama models installed locally (helps you pick a `model` for `extract_with_llm`) |
+| `get_batch_results` | 1 | Retrieve paginated results for a `batch_scrape` job by `batchId` |
+| `scrape` | 2 | **Unified single-fetch, multi-format extraction.** Pass a `formats` array (markdown/html/rawHtml/text/links/metadata/screenshot/json-schema) plus `onlyMainContent`; one fetch serves every requested format with per-format partial-success warnings |
+| `scrape_structured` | 2 | Extract structured data with CSS selectors |
+| `extract_content` | 2 | Enhanced content extraction |
+| `map_site` | 2 | Discover and map website structure (optional `search=` ranks the discovered URLs) |
+| `process_document` | 2 | Multi-format document processing |
+| `localization` | 2 | Multi-language and geo-location management |
 | `track_changes` | 3 | Monitor content changes over time |
+| `analyze_content` | 3 | Comprehensive content analysis |
+| `extract_structured` | 3 | LLM-powered schema-driven extraction (your own LLM key or local Ollama) |
+| `extract_with_llm` | 3 | Natural-language extraction. Defaults to a local Ollama model; pass `provider: "openai" \| "anthropic"` with the matching key for cloud models (external LLM billed by your provider) |
+| `summarize_content` | 4 | Generate intelligent summaries |
+| `crawl_deep` | 4 | Deep crawl entire websites |
 | `search_web` | 5 | Search the web using Google Search API |
-| `crawl_deep` | 5 | Deep crawl entire websites |
 | `batch_scrape` | 5 | Process multiple URLs simultaneously |
 | `scrape_with_actions` | 5 | Browser automation chains |
 | `generate_llms_txt` | 5 | Generate AI interaction guidelines |
-| `localization` | 5 | Multi-language and geo-location management |
+| `stealth_mode` | 5 | Anti-detection browser management |
 | `agent` | 8 | **Autonomous research/extraction from a natural-language prompt — no URLs required.** Plans, gathers, and shapes an answer under hard safety stops (max steps/URLs/wall-clock enforced by the orchestrator, never the LLM) |
 | `deep_research` | 10 | Multi-stage research with source verification |
-| `stealth_mode` | 10 | Anti-detection browser management |
 For the full canonical capabilities reference (all tools, CLI commands, stealth engines, research workflow), see [SKILL.md](SKILL.md).
@@ -194,7 +189,7 @@ For the full canonical capabilities reference (all tools, CLI commands, stealth
 ## 💳 Pricing
-**15 local tools are free forever — no API key, no credit card.** Credits only meter the premium tools that run on CrawlForge infrastructure.
+**Every tool is metered and requires an API key.** New accounts get 1,000 free trial credits — no credit card required to start.
 | Plan | Credits/Month | Best For |
 |------|---------------|----------|
@@ -204,7 +199,7 @@ For the full canonical capabilities reference (all tools, CLI commands, stealth
 | **Business** ($399) | 250,000 | Large scale operations |
 **All plans include:**
-- Access to all 26 tools (the 15 local tools never consume credits)
+- Access to all 26 tools
 - Credits never expire and roll over month-to-month
 - API access and webhook notifications
@@ -238,7 +233,7 @@ export RESEARCH_MAX_STEALTH_RETRIES="8"    # cap on stealth retries per research
 ### Local-LLM quickstart (`extract_with_llm` with Ollama)
-`extract_with_llm` defaults to a local Ollama model — no API key, no API costs, no data leaving your machine.
+`extract_with_llm` defaults to a local Ollama model — no LLM-provider key, no per-token LLM costs, and no data leaving your machine (the CrawlForge credit cost still applies).
 ```bash
 # 1. Install Ollama:  https://ollama.com
@@ -303,7 +298,7 @@ Once configured, use these tools in your AI assistant:
 ## 🔒 Security & Privacy
-- **Secure Authentication**: API keys required for all metered premium tools (the 15 free local tools run without one)
+- **Secure Authentication**: API keys required for all metered tools
 - **Local Storage**: API keys stored securely at `~/.crawlforge/config.json`
 - **HTTPS Only**: All connections use encrypted HTTPS
 - **No Data Retention**: We don't store scraped data, only usage logs
@@ -317,7 +312,7 @@ Once configured, use these tools in your AI assistant:
 - **Action allowlist**: `scrape_with_actions` accepts only 7 action types (`wait`, `click`, `type`, `press`, `scroll`, `screenshot`, `executeJavaScript`). No download, file-write, or arbitrary cross-page navigation primitives exist.
 - **JavaScript gate**: The `executeJavaScript` action throws by default. Set `ALLOW_JAVASCRIPT_EXECUTION=true` at deploy time to enable (not recommended in production).
 - **MCP Elicitation** (v3.6.0): Four tools request user confirmation before executing expensive operations — `deep_research` (>50 URLs), `batch_scrape` (sync mode, >25 URLs), `crawl_deep` (projected >500 pages), `extract_structured` (schema has >3 required fields with no LLM configured). Credit-low situations also elicit. Confirmation is best-effort: if the MCP client does not support elicitation the tool proceeds (fail-open).
-- **Per-tool credit gating**: Every tool is wrapped with `withAuth()`; metered tools check and deduct credits before execution (fail-closed since v3.0.18). Free local tools (cost 0) skip the credit path entirely.
+- **Per-tool credit gating**: Every tool is wrapped with `withAuth()` and is metered — credits are checked and deducted before execution, and a valid API key is required for every tool (fail-closed since v3.0.18).
 See [docs/sandboxing-and-approvals.md](docs/sandboxing-and-approvals.md) for the full reference.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "crawlforge-mcp-server",
-  "version": "4.6.6",
+  "version": "4.7.1",
   "mcpName": "io.github.mysleekdesigns/crawlforge-mcp-server",
   "description": "CrawlForge MCP Server - Professional Model Context Protocol server with 26 web scraping, crawling, deep-research, and autonomous-extraction tools. Returns clean Markdown and structured JSON for Claude, Cursor, and any MCP client. Defaults to local Ollama for LLM extraction (no API key needed); OpenAI/Anthropic available as opt-in. Includes a unified multi-format scrape tool, an autonomous agent, pre-built site templates, and Camoufox stealth browsing.",
   "main": "server.js",

package/server.js CHANGED Viewed

@@ -68,15 +68,14 @@ if (!AuthManager.isAuthenticated() && !AuthManager.isCreatorMode()) {
       process.exit(1);
     }
   } else {
-    // Open-core Phase 2: no API key is fine — start in free-tier mode.
-    // Tier-0 tools (cost 0) run locally without a key; Tier-1 metered tools
-    // return a "not configured" error until a key is set.
+    // Every tool is metered and requires an API key — there is no free tier.
+    // The server still starts so the MCP client can list tools, but every
+    // tool call errors with "not configured" until a key is set.
     // Status → stderr; stdout is reserved for the MCP JSON-RPC stream.
-    console.error('ℹ️  CrawlForge running in free-tier mode (no API key configured).');
-    console.error('   Free local tools work out of the box. Premium tools (search_web,');
-    console.error('   crawl_deep, stealth_mode, agent, deep_research, …) need an API key:');
-    console.error('   get one at https://www.crawlforge.dev/signup, then run `npm run setup`');
-    console.error('   or set CRAWLFORGE_API_KEY.');
+    console.error('⚠️  No CrawlForge API key configured — all tools require a key.');
+    console.error('   Every tool (fetch_url, search_web, deep_research, …) is metered.');
+    console.error('   Get a key at https://www.crawlforge.dev/signup, then run `npm run setup`');
+    console.error('   or set CRAWLFORGE_API_KEY. Tool calls will error until a key is set.');
   }
 }
@@ -90,7 +89,7 @@ if (configErrors.length > 0 && config.server.nodeEnv === 'production') {
 // Create the server
 const server = new McpServer({
   name: "crawlforge",
-  version: "4.6.6",
+  version: "4.7.1",
   description: "Production-ready MCP server with 26 web scraping, crawling, and content processing tools. Features MCP Resources (crawlforge://), Prompts, Sampling fallback, Elicitation, stealth browsing, deep research, structured extraction, change tracking, local-LLM extraction via Ollama, unified multi-format scrape, and autonomous agent tool.",
   homepage: "https://www.crawlforge.dev",
   icon: "https://www.crawlforge.dev/icon.png"

package/src/core/AuthManager.js CHANGED Viewed

@@ -239,11 +239,6 @@ class AuthManager {
       return true;
     }
-    // Open-core Phase 2: Tier-0 tools cost 0 and run without an API key
-    if (estimatedCredits === 0) {
-      return true;
-    }
     if (!this.config) {
       throw new Error('CrawlForge not configured. Run setup first.');
     }
@@ -507,51 +502,53 @@ class AuthManager {
   /**
    * Get credit cost for a tool.
    *
-   * Open-core Phase 1 (docs/tier-map.md): this table is the single source of
-   * truth shared with the backend (crawlforge-website/src/lib/credits.ts).
-   * Tier 0 tools run locally on the user's machine and cost 0; Tier 1 tools
-   * are metered per COGS.
+   * Every tool is metered and requires an API key — there is no free tier.
+   * This table is the single source of truth shared with the backend
+   * (crawlforge-website/src/lib/credits.ts TOOL_CREDIT_COSTS).
    *
    * @param {string} tool
-   * @param {object} [params] — invocation params; only used for per-call
-   *        exceptions (scrape's screenshot format needs a server browser).
    */
-  getToolCost(tool, params) {
-    // Tier-0 exception: the screenshot format of `scrape` is browser-backed
-    if (tool === 'scrape' && Array.isArray(params?.formats) && params.formats.includes('screenshot')) {
-      return 2;
-    }
+  getToolCost(tool) {
     const costs = {
-      // Tier 0 — free, local (key optional)
-      fetch_url: 0,
-      extract_text: 0,
-      extract_links: 0,
-      extract_metadata: 0,
-      scrape_structured: 0,
-      scrape_template: 0,
-      extract_content: 0,
-      scrape: 0, // 2 if formats includes 'screenshot' (handled above)
-      summarize_content: 0,
-      analyze_content: 0,
-      extract_with_llm: 0,
-      extract_structured: 0,
-      process_document: 0,
-      list_ollama_models: 0,
-      get_batch_results: 0, // retrieval of an already-paid batch job
-      // Tier 1 — metered (costs reflect COGS)
-      map_site: 3,
+      // 1 credit
+      fetch_url: 1,
+      extract_text: 1,
+      extract_links: 1,
+      extract_metadata: 1,
+      scrape_template: 1,
+      list_ollama_models: 1,
+      get_batch_results: 1, // retrieval of an already-paid batch job
+      // 2 credits
+      scrape_structured: 2,
+      extract_content: 2,
+      map_site: 2,
+      process_document: 2,
+      localization: 2,
+      scrape: 2,
+      // 3 credits
       track_changes: 3,
-      generate_llms_txt: 5,
-      search_web: 5,
-      crawl_deep: 5,
-      batch_scrape: 5,
+      analyze_content: 3,
+      extract_structured: 3,
+      extract_with_llm: 3,
+      // 4 credits
+      summarize_content: 4,
+      crawl_deep: 4,
+      // 5 credits
+      stealth_mode: 5,
       scrape_with_actions: 5,
-      localization: 5,
+      batch_scrape: 5,
+      search_web: 5,
+      generate_llms_txt: 5,
+      // 8 credits
       agent: 8, // projectCost() scales with maxUrls
-      deep_research: 10,
-      stealth_mode: 10
+      // 10 credits
+      deep_research: 10
     };
     return costs[tool] ?? 1;
@@ -574,7 +571,7 @@ class AuthManager {
     // Override for tools whose cost scales with params
     let projected = base;
-    let note = base === 0 ? 'Free local tool — no credits charged.' : 'Fixed cost per invocation.';
+    let note = 'Fixed cost per invocation.';
     switch (toolName) {
       case 'batch_scrape': {
@@ -596,14 +593,11 @@ class AuthManager {
         break;
       }
       case 'extract_with_llm':
-        note = 'Free local tool. External LLM API call billed by your LLM provider, not in credits.';
+        note = 'External LLM API call billed by your LLM provider, separate from the credit cost.';
         break;
       case 'scrape': {
-        // Free local tool; only the browser-backed screenshot format is metered
         projected = base;
-        note = base > 0
-          ? 'screenshot format requires a server browser (2 credits). Other formats are free.'
-          : 'Free local tool — no credits charged. json format may incur external LLM cost.';
+        note = 'Fixed cost per invocation. json format may incur external LLM cost (billed by your provider).';
         break;
       }
       case 'agent': {
@@ -614,7 +608,7 @@ class AuthManager {
         break;
       }
       default:
-        note = base === 0 ? 'Free local tool — no credits charged.' : 'Fixed cost per invocation.';
+        note = 'Fixed cost per invocation.';
     }
     return { projected, note };

package/src/core/LLMsTxtAnalyzer.js CHANGED Viewed

@@ -50,7 +50,16 @@ export class LLMsTxtAnalyzer {
       apis: [],
       contentTypes: {},
       securityAreas: [],
-      rateLimit: {},
+      // Conservative defaults so output never renders `undefined` when live
+      // rate-limit probing is skipped (analyzeRateLimiting only runs with
+      // probeRateLimit:true). Overwritten with measured values when probed.
+      rateLimit: {
+        recommendedDelay: 1000,
+        maxConcurrency: 5,
+        recommendedRPM: 30,
+        reasoning: 'Conservative defaults applied; live rate-limit probing was not performed (pass probeRateLimit:true to measure actual response times).',
+        averageResponseTime: null
+      },
       guidelines: {},
       metadata: {},
       errors: []

package/src/core/ResearchOrchestrator.js CHANGED Viewed

@@ -32,6 +32,7 @@ export class ResearchOrchestrator extends EventEmitter {
       concurrency = 5,
       enableSourceVerification = true,
       enableConflictDetection = true,
+      credibilityThreshold = 0.3,
       cacheEnabled = true,
       cacheTTL = 1800000, // 30 minutes
       researchApproach = 'broad',
@@ -61,6 +62,7 @@ export class ResearchOrchestrator extends EventEmitter {
     this.concurrency = Math.min(Math.max(1, concurrency), 20);
     this.enableSourceVerification = enableSourceVerification;
     this.enableConflictDetection = enableConflictDetection;
+    this.credibilityThreshold = Math.min(Math.max(0, credibilityThreshold), 1);
     // Stealth fallback config + lazy state (browser launched only on first block)
     this.enableStealthFallback = enableStealthFallback;
@@ -859,7 +861,7 @@ export class ResearchOrchestrator extends EventEmitter {
         }
         // Only include sources that meet minimum credibility threshold
-        if (overallCredibility >= 0.3) {
+        if (overallCredibility >= this.credibilityThreshold) {
           verifiedSources.push({
             ...source,
             credibilityFactors,
@@ -1360,7 +1362,7 @@ export class ResearchOrchestrator extends EventEmitter {
   generateKeyFindings(claimGroups, sources) {
     return claimGroups
-      .filter(group => group.avgCredibility >= 0.3)
+      .filter(group => group.avgCredibility >= this.credibilityThreshold)
       .sort((a, b) => b.consensusStrength - a.consensusStrength)
       .slice(0, 10)
       .map(group => ({
@@ -1373,7 +1375,7 @@ export class ResearchOrchestrator extends EventEmitter {
   compileSupportingEvidence(sources) {
     return sources
-      .filter(source => source.overallCredibility >= 0.3)
+      .filter(source => source.overallCredibility >= this.credibilityThreshold)
       .map(source => ({
         title: source.title,
         url: source.link,

package/src/server/withAuth.js CHANGED Viewed

@@ -4,8 +4,8 @@
  * (OpenTelemetry spans + Prometheus counters) added in v3.2.0.
  *
  * Contract:
- *   - resolves toolCost once per call (params-aware; 0-cost Tier-0 tools skip
- *     the credit check and usage reports entirely — open-core Phase 2)
+ *   - resolves toolCost once per call; every tool is metered (no free tier),
+ *     so a valid API key is required for every invocation
  *   - try/finally guarantees a single `tool invocation` log line per call
  *   - log payload: { toolName, paramHash, durationMs, outcome, creditCost, creatorMode }
  *   - outcome ∈ { 'success' | 'error' | 'insufficient_credits' }
@@ -36,16 +36,12 @@ export function makeWithAuth({ authManager, logger, metrics = null }) {
       const startTime = Date.now();
       const paramHash = hashParams(params);
       const creatorMode = authManager.isCreatorMode();
-      // Params-aware: scrape's screenshot format is metered, other formats free
       const creditCost = creatorMode ? 0 : authManager.getToolCost(toolName, params);
-      // Open-core Phase 2: Tier-0 tools (cost 0) run locally for free — no
-      // credit check, no usage report, and no API key required.
-      const freeTier = creditCost === 0;
       let outcome = 'pending';
       let thrown = null;
       try {
-        if (!creatorMode && !freeTier) {
+        if (!creatorMode) {
           const hasCredits = await authManager.checkCredits(creditCost);
           if (!hasCredits) {
             outcome = 'insufficient_credits';
@@ -90,7 +86,7 @@ export function makeWithAuth({ authManager, logger, metrics = null }) {
           // Cost injection must never break the request path
         }
-        if (!creatorMode && !freeTier) {
+        if (!creatorMode) {
           await authManager.reportUsage(toolName, creditCost, params, 200, Date.now() - startTime);
         }
@@ -98,7 +94,7 @@ export function makeWithAuth({ authManager, logger, metrics = null }) {
       } catch (error) {
         outcome = 'error';
         thrown = error;
-        if (!creatorMode && !freeTier) {
+        if (!creatorMode) {
           await authManager.reportUsage(
             toolName,
             Math.max(1, Math.floor(creditCost * 0.5)),

package/src/tools/llmstxt/generateLLMsTxt.js CHANGED Viewed

@@ -391,7 +391,9 @@ export class GenerateLLMsTxtTool {
       lines.push('');
       lines.push('### Technical Justification');
       lines.push(`${analysis.rateLimit.reasoning}`);
-      lines.push(`Average response time: ${analysis.rateLimit.averageResponseTime}ms`);
+      if (analysis.rateLimit.averageResponseTime != null) {
+        lines.push(`Average response time: ${analysis.rateLimit.averageResponseTime}ms`);
+      }
       lines.push('');
     }

package/src/tools/research/deepResearch.js CHANGED Viewed

@@ -272,6 +272,10 @@ export class DeepResearchTool {
       maxUrls: params.maxUrls,
       timeLimit: params.timeLimit,
       concurrency: params.concurrency,
+      // Minimum credibility a source must clear in verifySourceCredibility.
+      // Must be on the orchestrator *constructor* config (not the
+      // conductResearch options) — that is the only place it is now read.
+      credibilityThreshold: params.credibilityThreshold,
       // The orchestrator tunes its query expansion to the approach (commercial
       // vs academic vs current-events); without this it always used academic
       // variations, which poisoned commercial/comparative searches.
@@ -356,7 +360,6 @@ export class DeepResearchTool {
   buildResearchOptions(params) {
     return {
       sourceTypes: params.sourceTypes,
-      credibilityThreshold: params.credibilityThreshold,
       includeRecentOnly: params.includeRecentOnly,
       queryExpansion: params.queryExpansion,
       enableConflictDetection: params.enableConflictDetection,

package/src/tools/search/searchWeb.js CHANGED Viewed

@@ -79,9 +79,10 @@ export class SearchWebTool {
     // Check for Creator Mode - allows search without API key for development/testing
     const isCreatorMode = isCreatorModeVerified();
-    // Open-core Phase 2: no API key is allowed at construction time (the server
-    // now starts in free-tier mode without one). The key requirement is
-    // enforced at execute() time instead, so Tier-0 tools keep working.
+    // The server can start without a key so the MCP client can list tools, so
+    // construction must not throw here. Every tool is metered and the key
+    // requirement is enforced before execute() runs (withAuth credit check)
+    // and again at execute() time below.
     if (!apiKey && !isCreatorMode) {
       this.searchAdapter = null;
       this.isCreatorModeFallback = false;
@@ -127,7 +128,7 @@ export class SearchWebTool {
       }
       // --- end SearXNG short-circuit ---
-      // Free-tier mode: search via the CrawlForge proxy needs an API key
+      // Search via the CrawlForge proxy needs an API key
       if (!this.searchAdapter) {
         throw new Error('CrawlForge API key is required for search functionality. Get one at https://www.crawlforge.dev/signup');
       }