npm - @monostate/node-scraper - Versions diffs - 1.2.0 → 1.5.0 - Mend

@monostate/node-scraper 1.2.0 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md CHANGED Viewed

@@ -19,7 +19,11 @@ yarn add @monostate/node-scraper
 pnpm add @monostate/node-scraper
 ```
-**🎉 New in v1.2.0**: Lightpanda binary is now automatically downloaded and configured during installation! No manual setup required.
+**🤖 New in v1.5.0**: AI-powered Q&A! Ask questions about any website using OpenRouter, OpenAI, or built-in AI. (Note: v1.4.0 was an internal release)
+**🎉 Also in v1.3.0**: PDF parsing support added! Automatically extracts text, metadata, and page count from PDF documents.
+**✨ Also in v1.2.0**: Lightpanda binary is now automatically downloaded and configured during installation! No manual setup required.
 ### Zero-Configuration Setup
@@ -45,6 +49,10 @@ console.log(screenshot.screenshot); // Base64 encoded image
 // Quick screenshot (optimized for speed)
 const quick = await quickShot('https://example.com');
 console.log(quick.screenshot); // Fast screenshot capture
+// PDF parsing (automatic detection)
+const pdfResult = await smartScrape('https://example.com/document.pdf');
+console.log(pdfResult.content); // Extracted text, metadata, page count
 ```
 ### Advanced Usage
@@ -66,12 +74,13 @@ await scraper.cleanup(); // Clean up resources
 ## 🔧 How It Works
-BNCA uses a sophisticated 3-tier fallback system:
+BNCA uses a sophisticated multi-tier system with intelligent detection:
 ### 1. 🔄 Direct Fetch (Fastest)
 - Pure HTTP requests with intelligent HTML parsing
 - **Performance**: Sub-second responses
 - **Success rate**: 75% of websites
+- **PDF Detection**: Automatically detects PDFs by URL, content-type, or magic bytes
 ### 2. 🐼 Lightpanda Browser (Fast)
 - Lightweight browser engine (2-3x faster than Chromium)
@@ -83,6 +92,12 @@ BNCA uses a sophisticated 3-tier fallback system:
 - **Performance**: Complete JavaScript execution
 - **Fallback triggers**: Complex interactions needed
+### 📄 PDF Parser (Specialized)
+- Automatic PDF detection and parsing
+- **Features**: Text extraction, metadata, page count
+- **Smart Detection**: Works even when PDFs are served with wrong content-types
+- **Performance**: Typically 100-500ms for most PDFs
 ### 📸 Screenshot Methods
 - **Chrome CLI**: Direct Chrome screenshot capture
 - **Quickshot**: Optimized with retry logic and smart timeouts
@@ -186,6 +201,101 @@ Clean up resources (close browser instances).
 await scraper.cleanup();
 ```
+### 🤖 AI-Powered Q&A
+Ask questions about any website and get AI-generated answers:
+```javascript
+// Method 1: Using your own OpenRouter API key
+const scraper = new BNCASmartScraper({
+  openRouterApiKey: 'your-openrouter-api-key'
+});
+const result = await scraper.askAI('https://example.com', 'What is this website about?');
+// Method 2: Using OpenAI API (or compatible endpoints)
+const scraper = new BNCASmartScraper({
+  openAIApiKey: 'your-openai-api-key',
+  // Optional: Use a compatible endpoint like Groq, Together AI, etc.
+  openAIBaseUrl: 'https://api.groq.com/openai'
+});
+const result = await scraper.askAI('https://example.com', 'What services do they offer?');
+// Method 3: One-liner with OpenRouter
+import { askWebsiteAI } from '@monostate/node-scraper';
+const answer = await askWebsiteAI('https://example.com', 'What is the main topic?', {
+  openRouterApiKey: process.env.OPENROUTER_API_KEY
+});
+// Method 4: Using BNCA backend API (requires BNCA API key)
+const scraper = new BNCASmartScraper({
+  apiKey: 'your-bnca-api-key'
+});
+const result = await scraper.askAI('https://example.com', 'What products are featured?');
+```
+**API Key Priority:**
+1. OpenRouter API key (`openRouterApiKey`)
+2. OpenAI API key (`openAIApiKey`)
+3. BNCA backend API (`apiKey`)
+4. Local fallback (pattern matching - no API key required)
+**Configuration Options:**
+```javascript
+const result = await scraper.askAI(url, question, {
+  // OpenRouter specific
+  openRouterApiKey: 'sk-or-...',
+  model: 'meta-llama/llama-4-scout:free', // Default model
+  // OpenAI specific
+  openAIApiKey: 'sk-...',
+  openAIBaseUrl: 'https://api.openai.com', // Or compatible endpoint
+  model: 'gpt-3.5-turbo',
+  // Shared options
+  temperature: 0.3,
+  maxTokens: 500
+});
+```
+**Response Format:**
+```javascript
+{
+  success: true,
+  answer: "This website is about...",
+  method: "direct-fetch",     // Scraping method used
+  scrapeTime: 1234,          // Time to scrape in ms
+  processing: "openrouter"   // AI processing method used
+}
+```
+### 📄 PDF Support
+BNCA automatically detects and parses PDF documents:
+```javascript
+const pdfResult = await smartScrape('https://example.com/document.pdf');
+// Parsed content includes:
+const content = JSON.parse(pdfResult.content);
+console.log(content.title);          // PDF title
+console.log(content.author);         // Author name
+console.log(content.pages);          // Number of pages
+console.log(content.text);           // Full extracted text
+console.log(content.creationDate);   // Creation date
+console.log(content.metadata);       // Additional metadata
+```
+**PDF Detection Methods:**
+- URL ending with `.pdf`
+- Content-Type header `application/pdf`
+- Binary content starting with `%PDF` (magic bytes)
+- Works with PDFs served as `application/octet-stream` (e.g., GitHub raw files)
+**Limitations:**
+- Maximum file size: 20MB
+- Text extraction only (no image OCR)
+- Requires `pdf-parse` dependency (automatically installed)
 ## 📱 Next.js Integration
 ### API Route Example
@@ -354,7 +464,14 @@ const result: ScrapingResult = await scraper.scrape('https://example.com');
 ## 📋 Changelog
-### v1.2.0 (Latest)
+### v1.3.0 (Latest)
+- 📄 **PDF Support**: Full PDF parsing with text extraction, metadata, and page count
+- 🔍 **Smart PDF Detection**: Detects PDFs by URL patterns, content-type, or magic bytes
+- 🚀 **Robust Parsing**: Handles PDFs served with incorrect content-types (e.g., GitHub raw files)
+- ⚡ **Fast Performance**: PDF parsing typically completes in 100-500ms
+- 📊 **Comprehensive Extraction**: Title, author, creation date, page count, and full text
+### v1.2.0
 - 🎉 **Auto-Installation**: Lightpanda binary is now automatically downloaded during `npm install`
 - 🔧 **Cross-Platform Support**: Automatic detection and installation for macOS, Linux, and Windows/WSL
 - ⚡ **Improved Performance**: Enhanced binary detection and ES6 module compatibility

package/bin/lightpanda CHANGED Viewed

File without changes

package/index.d.ts CHANGED Viewed

@@ -13,6 +13,24 @@ export interface ScrapingOptions {
   lightpandaPath?: string;
   /** Custom user agent string */
   userAgent?: string;
+  /** BNCA API key for backend services */
+  apiKey?: string;
+  /** BNCA API URL (defaults to https://bnca-api.fly.dev) */
+  apiUrl?: string;
+  /** OpenRouter API key for AI processing */
+  openRouterApiKey?: string;
+  /** OpenAI API key for AI processing */
+  openAIApiKey?: string;
+  /** OpenAI base URL (for compatible endpoints) */
+  openAIBaseUrl?: string;
+  /** AI model to use */
+  model?: string;
+  /** AI temperature setting */
+  temperature?: number;
+  /** Maximum tokens for AI response */
+  maxTokens?: number;
+  /** HTTP referer for OpenRouter */
+  referer?: string;
 }
 export interface ScrapingResult {
@@ -146,6 +164,22 @@ export class BNCASmartScraper {
    * @returns Promise resolving to screenshot result
    */
   quickshot(url: string, options?: ScrapingOptions): Promise<ScrapingResult>;
+  /**
+   * Ask AI a question about a URL
+   * @param url The URL to analyze
+   * @param question The question to answer about the page
+   * @param options Optional configuration overrides
+   * @returns Promise resolving to AI answer
+   */
+  askAI(url: string, question: string, options?: ScrapingOptions): Promise<{
+    success: boolean;
+    answer?: string;
+    error?: string;
+    method?: string;
+    scrapeTime?: number;
+    processing?: 'openrouter' | 'openai' | 'backend' | 'local';
+  }>;
   /**
    * Get performance statistics for all methods
@@ -248,6 +282,22 @@ export function smartScreenshot(url: string, options?: ScrapingOptions): Promise
  */
 export function quickShot(url: string, options?: ScrapingOptions): Promise<ScrapingResult>;
+/**
+ * Convenience function for asking AI questions about a webpage
+ * @param url The URL to analyze
+ * @param question The question to answer
+ * @param options Optional configuration
+ * @returns Promise resolving to AI answer
+ */
+export function askWebsiteAI(url: string, question: string, options?: ScrapingOptions): Promise<{
+  success: boolean;
+  answer?: string;
+  error?: string;
+  method?: string;
+  scrapeTime?: number;
+  processing?: 'openrouter' | 'openai' | 'backend' | 'local';
+}>;
 /**
  * Default export - same as BNCASmartScraper class
  */

package/index.js CHANGED Viewed

@@ -5,6 +5,7 @@ import { existsSync, statSync } from 'fs';
 import path from 'path';
 import { fileURLToPath } from 'url';
 import { promises as fsPromises } from 'fs';
+import pdfParse from 'pdf-parse/lib/pdf-parse.js';
 let puppeteer = null;
 try {
@@ -42,10 +43,237 @@ export class BNCASmartScraper {
     this.stats = {
       directFetch: { attempts: 0, successes: 0 },
       lightpanda: { attempts: 0, successes: 0 },
-      puppeteer: { attempts: 0, successes: 0 }
+      puppeteer: { attempts: 0, successes: 0 },
+      pdf: { attempts: 0, successes: 0 }
     };
   }
+  /**
+   * Ask AI a question about a URL
+   * Scrapes the URL and uses AI to answer the question
+   *
+   * @param {string} url - URL to analyze
+   * @param {string} question - Question to answer
+   * @param {object} options - Additional options
+   * @returns {Promise<object>} AI response with answer
+   */
+  async askAI(url, question, options = {}) {
+    try {
+      // First scrape the content
+      const scrapeResult = await this.scrape(url, options);
+      if (!scrapeResult.success) {
+        return {
+          success: false,
+          error: `Failed to scrape URL: ${scrapeResult.error}`,
+          method: scrapeResult.method
+        };
+      }
+      // Check for OpenRouter/OpenAI API key
+      const openRouterKey = options.openRouterApiKey || this.options.openRouterApiKey || process.env.OPENROUTER_API_KEY;
+      const openAIKey = options.openAIApiKey || this.options.openAIApiKey || process.env.OPENAI_API_KEY;
+      // Priority: OpenRouter > OpenAI > Backend API > Local
+      if (openRouterKey) {
+        try {
+          const answer = await this.processWithOpenRouter(question, scrapeResult.content, openRouterKey, options);
+          return {
+            success: true,
+            answer,
+            method: scrapeResult.method,
+            scrapeTime: scrapeResult.stats.totalTime,
+            processing: 'openrouter'
+          };
+        } catch (error) {
+          this.log('  ⚠️ OpenRouter API call failed, falling back...');
+        }
+      }
+      if (openAIKey) {
+        try {
+          const answer = await this.processWithOpenAI(question, scrapeResult.content, openAIKey, options);
+          return {
+            success: true,
+            answer,
+            method: scrapeResult.method,
+            scrapeTime: scrapeResult.stats.totalTime,
+            processing: 'openai'
+          };
+        } catch (error) {
+          this.log('  ⚠️ OpenAI API call failed, falling back...');
+        }
+      }
+      // If BNCA API key is provided, use the backend API
+      if (this.options.apiKey) {
+        try {
+          const response = await fetch(`${this.options.apiUrl || 'https://bnca-api.fly.dev'}/aireply`, {
+            method: 'POST',
+            headers: {
+              'x-api-key': this.options.apiKey,
+              'Content-Type': 'application/json'
+            },
+            body: JSON.stringify({ url, question })
+          });
+          if (response.ok) {
+            const data = await response.json();
+            return {
+              success: true,
+              answer: data.answer,
+              method: scrapeResult.method,
+              scrapeTime: scrapeResult.stats.totalTime,
+              processing: 'backend'
+            };
+          }
+        } catch (error) {
+          this.log('  ⚠️ Backend API call failed, using local AI processing');
+        }
+      }
+      // Local AI processing fallback
+      const answer = this.processLocally(question, scrapeResult.content);
+      return {
+        success: true,
+        answer,
+        method: scrapeResult.method,
+        scrapeTime: scrapeResult.stats.totalTime,
+        processing: 'local'
+      };
+    } catch (error) {
+      return {
+        success: false,
+        error: error.message || 'AI processing failed'
+      };
+    }
+  }
+  /**
+   * Process with OpenRouter API
+   * @private
+   */
+  async processWithOpenRouter(question, content, apiKey, options = {}) {
+    const parsedContent = typeof content === 'string' ? JSON.parse(content) : content;
+    const contentText = `
+Title: ${parsedContent.title || 'Unknown'}
+Content: ${parsedContent.content || parsedContent.bodyText || 'No content available'}
+Meta Description: ${parsedContent.metaDescription || 'None'}
+${parsedContent.headings?.length ? `\nHeadings:\n${parsedContent.headings.map(h => `- ${h.text || h}`).join('\n')}` : ''}
+`.trim();
+    const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
+      method: 'POST',
+      headers: {
+        'Content-Type': 'application/json',
+        'Authorization': `Bearer ${apiKey}`,
+        'HTTP-Referer': options.referer || 'https://github.com/monostate/node-scraper',
+        'X-Title': 'BNCA Node Scraper',
+      },
+      body: JSON.stringify({
+        model: options.model || 'meta-llama/llama-4-scout:free',
+        messages: [
+          {
+            role: 'system',
+            content: 'You are a helpful assistant that answers questions based on website content. Provide accurate, concise answers based only on the provided content.'
+          },
+          {
+            role: 'user',
+            content: `Based on the following website content, please answer this question: ${question}\n\nWebsite content:\n${contentText}`
+          }
+        ],
+        temperature: options.temperature || 0.3,
+        max_tokens: options.maxTokens || 500,
+      }),
+    });
+    if (!response.ok) {
+      throw new Error(`OpenRouter API error: ${response.status}`);
+    }
+    const data = await response.json();
+    return data.choices[0]?.message?.content || 'No response from AI';
+  }
+  /**
+   * Process with OpenAI API
+   * @private
+   */
+  async processWithOpenAI(question, content, apiKey, options = {}) {
+    const parsedContent = typeof content === 'string' ? JSON.parse(content) : content;
+    const contentText = `
+Title: ${parsedContent.title || 'Unknown'}
+Content: ${parsedContent.content || parsedContent.bodyText || 'No content available'}
+Meta Description: ${parsedContent.metaDescription || 'None'}
+${parsedContent.headings?.length ? `\nHeadings:\n${parsedContent.headings.map(h => `- ${h.text || h}`).join('\n')}` : ''}
+`.trim();
+    const baseUrl = options.openAIBaseUrl || 'https://api.openai.com';
+    const response = await fetch(`${baseUrl}/v1/chat/completions`, {
+      method: 'POST',
+      headers: {
+        'Content-Type': 'application/json',
+        'Authorization': `Bearer ${apiKey}`,
+      },
+      body: JSON.stringify({
+        model: options.model || 'gpt-3.5-turbo',
+        messages: [
+          {
+            role: 'system',
+            content: 'You are a helpful assistant that answers questions based on website content. Provide accurate, concise answers based only on the provided content.'
+          },
+          {
+            role: 'user',
+            content: `Based on the following website content, please answer this question: ${question}\n\nWebsite content:\n${contentText}`
+          }
+        ],
+        temperature: options.temperature || 0.3,
+        max_tokens: options.maxTokens || 500,
+      }),
+    });
+    if (!response.ok) {
+      throw new Error(`OpenAI API error: ${response.status}`);
+    }
+    const data = await response.json();
+    return data.choices[0]?.message?.content || 'No response from AI';
+  }
+  /**
+   * Local AI processing (simple pattern matching)
+   * @private
+   */
+  processLocally(question, content) {
+    const parsedContent = typeof content === 'string' ?
+      JSON.parse(content) : content;
+    const title = parsedContent.title || 'Unknown';
+    const text = parsedContent.content || parsedContent.bodyText || '';
+    const lowerQuestion = question.toLowerCase();
+    if (lowerQuestion.includes('title')) {
+      return `The page title is "${title}".`;
+    }
+    if (lowerQuestion.includes('about') || lowerQuestion.includes('what')) {
+      return `This page titled "${title}" contains: ${text.substring(0, 200)}...`;
+    }
+    if (lowerQuestion.includes('contact') || lowerQuestion.includes('email')) {
+      const emailMatch = text.match(/[\w.-]+@[\w.-]+\.\w+/);
+      return emailMatch ?
+        `Found contact: ${emailMatch[0]}` :
+        'No contact information found.';
+    }
+    return `Based on "${title}": ${text.substring(0, 150)}...`;
+  }
   /**
    * Main scraping method with intelligent fallback
    */
@@ -60,6 +288,35 @@ export class BNCASmartScraper {
     let lastError = null;
     try {
+      // Check if URL is a PDF (by extension or content-type check)
+      const isPdfUrl = url.toLowerCase().endsWith('.pdf') ||
+                       url.toLowerCase().includes('.pdf?') ||
+                       url.toLowerCase().includes('/pdf/');
+      if (isPdfUrl) {
+        this.log('  📄 PDF detected, using PDF parser...');
+        result = await this.tryPDFParse(url, config);
+        if (result.success) {
+          method = 'pdf';
+          this.log('  ✅ PDF parsing successful');
+          const totalTime = Date.now() - startTime;
+          return {
+            ...result,
+            method,
+            performance: {
+              totalTime,
+              method
+            },
+            stats: this.getStats()
+          };
+        } else {
+          this.log('  ❌ PDF parsing failed');
+          lastError = result.error;
+        }
+      }
       // Step 1: Try direct fetch first (fastest)
       this.log('  🔄 Attempting direct fetch...');
       result = await this.tryDirectFetch(url, config);
@@ -67,6 +324,29 @@ export class BNCASmartScraper {
       if (result.success && !result.needsBrowser) {
         method = 'direct-fetch';
         this.log('  ✅ Direct fetch successful');
+      } else if (result.isPdf) {
+        // Direct fetch detected a PDF, try PDF parser
+        this.log('  📄 Direct fetch detected PDF content, using PDF parser...');
+        result = await this.tryPDFParse(url, config);
+        if (result.success) {
+          method = 'pdf';
+          this.log('  ✅ PDF parsing successful');
+          const totalTime = Date.now() - startTime;
+          return {
+            ...result,
+            method,
+            performance: {
+              totalTime,
+              method
+            },
+            stats: this.getStats()
+          };
+        } else {
+          this.log('  ❌ PDF parsing failed');
+          lastError = result.error;
+        }
       } else {
         this.log(result.needsBrowser ? '  ⚠️  Browser rendering required' : '  ❌ Direct fetch failed');
         lastError = result.error;
@@ -152,7 +432,32 @@ export class BNCASmartScraper {
         };
       }
-      const html = await response.text();
+      // Check if the response is actually a PDF
+      const contentType = response.headers.get('content-type') || '';
+      if (contentType.includes('application/pdf')) {
+        return {
+          success: false,
+          error: 'Content is PDF, should use PDF parser',
+          isPdf: true
+        };
+      }
+      // Get response as array buffer to check magic bytes
+      const buffer = await response.arrayBuffer();
+      const firstBytes = new Uint8Array(buffer.slice(0, 5));
+      const signature = Array.from(firstBytes).map(b => String.fromCharCode(b)).join('');
+      // Check for PDF magic bytes
+      if (signature.startsWith('%PDF')) {
+        return {
+          success: false,
+          error: 'Content is PDF (detected by magic bytes), should use PDF parser',
+          isPdf: true
+        };
+      }
+      // Convert buffer back to text for HTML processing
+      const html = new TextDecoder().decode(buffer);
       // Intelligent browser detection
       const needsBrowser = this.detectBrowserRequirement(html, url);
@@ -390,6 +695,95 @@ export class BNCASmartScraper {
     }
   }
+  /**
+   * PDF parsing method - handles PDF documents
+   */
+  async tryPDFParse(url, config) {
+    this.stats.pdf.attempts++;
+    try {
+      // Download PDF with timeout
+      const controller = new AbortController();
+      const timeoutId = setTimeout(() => controller.abort(), config.timeout);
+      const response = await fetch(url, {
+        headers: {
+          'User-Agent': config.userAgent,
+          'Accept': 'application/pdf,*/*'
+        },
+        signal: controller.signal
+      });
+      clearTimeout(timeoutId);
+      if (!response.ok) {
+        return {
+          success: false,
+          error: `HTTP ${response.status}: ${response.statusText}`
+        };
+      }
+      // Check content type (be lenient - accept various content types)
+      const contentType = response.headers.get('content-type') || '';
+      const acceptableTypes = ['pdf', 'octet-stream', 'binary', 'download'];
+      const isAcceptableType = acceptableTypes.some(type => contentType.includes(type));
+      if (!isAcceptableType && !url.toLowerCase().includes('.pdf')) {
+        return {
+          success: false,
+          error: `Not a PDF document: ${contentType}`
+        };
+      }
+      // Get PDF buffer
+      const arrayBuffer = await response.arrayBuffer();
+      const buffer = Buffer.from(arrayBuffer);
+      // Check size limit (20MB)
+      if (buffer.length > 20 * 1024 * 1024) {
+        return {
+          success: false,
+          error: 'PDF too large (max 20MB)'
+        };
+      }
+      // Parse PDF
+      const pdfData = await pdfParse(buffer);
+      // Extract structured content
+      const content = {
+        title: pdfData.info?.Title || 'Untitled PDF',
+        author: pdfData.info?.Author || '',
+        subject: pdfData.info?.Subject || '',
+        keywords: pdfData.info?.Keywords || '',
+        creator: pdfData.info?.Creator || '',
+        producer: pdfData.info?.Producer || '',
+        creationDate: pdfData.info?.CreationDate || '',
+        modificationDate: pdfData.info?.ModificationDate || '',
+        pages: pdfData.numpages || 0,
+        text: pdfData.text || '',
+        metadata: pdfData.metadata || null,
+        url: url
+      };
+      this.stats.pdf.successes++;
+      return {
+        success: true,
+        content: JSON.stringify(content, null, 2),
+        size: buffer.length,
+        contentType: 'application/pdf',
+        pages: content.pages
+      };
+    } catch (error) {
+      return {
+        success: false,
+        error: `PDF parsing error: ${error.message}`
+      };
+    }
+  }
   /**
    * Intelligent detection of browser requirement
    */
@@ -633,7 +1027,9 @@ export class BNCASmartScraper {
         lightpanda: this.stats.lightpanda.attempts > 0 ?
           (this.stats.lightpanda.successes / this.stats.lightpanda.attempts * 100).toFixed(1) + '%' : '0%',
         puppeteer: this.stats.puppeteer.attempts > 0 ?
-          (this.stats.puppeteer.successes / this.stats.puppeteer.attempts * 100).toFixed(1) + '%' : '0%'
+          (this.stats.puppeteer.successes / this.stats.puppeteer.attempts * 100).toFixed(1) + '%' : '0%',
+        pdf: this.stats.pdf.attempts > 0 ?
+          (this.stats.pdf.successes / this.stats.pdf.attempts * 100).toFixed(1) + '%' : '0%'
       }
     };
   }
@@ -995,4 +1391,16 @@ export async function quickShot(url, options = {}) {
   }
 }
+export async function askWebsiteAI(url, question, options = {}) {
+  const scraper = new BNCASmartScraper(options);
+  try {
+    const result = await scraper.askAI(url, question, options);
+    return result;
+  } catch (error) {
+    throw error;
+  } finally {
+    await scraper.cleanup();
+  }
+}
 export default BNCASmartScraper;

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@monostate/node-scraper",
-  "version": "1.2.0",
-  "description": "Intelligent web scraping with multi-level fallback system - 11.35x faster than Firecrawl",
+  "version": "1.5.0",
+  "description": "Intelligent web scraping with AI Q&A, PDF support and multi-level fallback system - 11x faster than traditional scrapers",
   "type": "module",
   "main": "index.js",
   "types": "index.d.ts",
@@ -19,6 +19,9 @@
     "scripts/",
     "bin/"
   ],
+  "scripts": {
+    "postinstall": "node scripts/install-lightpanda.js"
+  },
   "keywords": [
     "web-scraping",
     "crawling",
@@ -29,6 +32,12 @@
     "data-extraction",
     "automation",
     "browser",
+    "ai-powered",
+    "question-answering",
+    "pdf-parsing",
+    "openrouter",
+    "openai",
+    "llm",
     "nextjs",
     "react",
     "performance",
@@ -37,7 +46,8 @@
   "author": "BNCA Team",
   "license": "MIT",
   "dependencies": {
-    "node-fetch": "^3.3.2"
+    "node-fetch": "^3.3.2",
+    "pdf-parse": "^1.1.1"
   },
   "peerDependencies": {
     "puppeteer": ">=20.0.0"
@@ -65,8 +75,5 @@
   },
   "publishConfig": {
     "access": "public"
-  },
-  "scripts": {
-    "postinstall": "node scripts/install-lightpanda.js"
   }
 }

package/LICENSE DELETED Viewed

@@ -1,21 +0,0 @@
-MIT License
-Copyright (c) 2025 BNCA Team
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.