npm - spectrawl - Versions diffs - 0.3.10 → 0.3.12 - Mend

spectrawl 0.3.10 → 0.3.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md +94 -62
package/package.json +1 -1
package/src/act/adapters/devto.js +18 -3
package/src/search/index.js +11 -9
package/src/search/summarizer.js +2 -2

package/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 The unified web layer for AI agents. Search, browse, authenticate, and act on platforms — one tool, self-hosted, free.
-**Free Tavily alternative** with Google-quality results via Gemini Grounded Search.
+**5,000 free searches/month** with Google-quality results via Gemini Grounded Search. Better answers than Tavily. Self-hosted.
 ## What It Does
@@ -12,6 +12,40 @@ AI agents need to interact with the web. That means searching, browsing pages, l
 npm install spectrawl
 ```
+## Real Output
+Here's actual output from Spectrawl vs Tavily on the same query:
+**Query:** `"best open source AI agent frameworks 2025"`
+### Spectrawl (free)
+```
+Time: 16.8s | Sources: 19
+Answer: The leading open-source AI agent frameworks for 2025 include AutoGen,
+CrewAI, LangChain, LangGraph, and Semantic Kernel [1, 2, 3]. AutoGen is
+recognized for enabling complex multi-agent conversations, while CrewAI
+focuses on orchestrating collaborative AI agents [1, 2]. LangChain and its
+component LangGraph provide robust tools for building sophisticated agent
+workflows and state management [1, 2, 3]. Semantic Kernel, developed by
+Microsoft, integrates large language models with conventional programming
+languages [1, 2, 3].
+Other prominent frameworks include LlamaIndex, Haystack, BabyAGI, AgentGPT,
+SuperAGI, MetaGPT, and Open Interpreter [1, 2].
+```
+**12 frameworks named, inline citations, 19 sources**
+### Tavily ($0.01/query)
+```
+Time: 2s | Sources: 10
+Answer: In 2025, LangGraph and Microsoft's AutoGen + Semantic Kernel are
+top open-source AI agent frameworks, favored for their robust orchestration
+and enterprise security features.
+```
+**3 frameworks named, no citations, 10 sources**
 ## Quick Start
 ```bash
@@ -23,32 +57,40 @@ export GEMINI_API_KEY=your-free-key  # Get one at aistudio.google.com
 const { Spectrawl } = require('spectrawl')
 const web = new Spectrawl()
-// Deep search — like Tavily but free
-const result = await web.deepSearch('best AI agent frameworks 2025')
-console.log(result.answer)    // AI-generated answer with citations
+// Deep search — returns sources for your agent/LLM to process
+const result = await web.deepSearch('how to build an MCP server in Node.js')
 console.log(result.sources)   // [{ title, url, content, score }]
-// Fast mode — snippets only, ~6s
+// With AI summary (opt-in — uses extra Gemini call)
+const withAnswer = await web.deepSearch('query', { summarize: true })
+console.log(withAnswer.answer)  // AI-generated answer with [1] [2] citations
+// Fast mode — snippets only, skip scraping
 const fast = await web.deepSearch('query', { mode: 'fast' })
-// Basic search — raw results, no AI
+// Basic search — raw results
 const basic = await web.search('query')
 ```
-### vs Tavily
+> **Why no summary by default?** Your agent already has an LLM. If we summarize AND your agent summarizes, you're paying two LLMs for one answer. We return rich sources — your agent does the rest.
+## vs Tavily
 | | Tavily | Spectrawl |
 |---|---|---|
-| Speed | ~2s | ~6-9s |
-| Search quality | Google index | Google via Gemini ✅ |
-| Results per query | 10 | 12-16 ✅ |
-| Citations | ✅ | ✅ |
-| Cost | $0.01/query | **Free** ✅ |
+| Speed | ~2s ✅ | ~7-17s |
+| Answer quality | Generic (3 items) | **Detailed** (12+ items) ✅ |
+| Inline citations | ❌ | **[1] [2] [3]** ✅ |
+| Results per query | 10 | **12-19** ✅ |
+| Cost | $0.01/query | **Free** (5K/mo) ✅ |
 | Self-hosted | No | **Yes** ✅ |
+| Source ranking | No | **Domain trust scoring** ✅ |
 | Stealth scraping | No | **Yes** ✅ |
 | Auth + posting | No | **24 adapters** ✅ |
 | Cached repeats | No | **<1ms** ✅ |
+Spectrawl wins on answer quality, result volume, features, and cost. Tavily wins on speed.
 ## Search
 Default cascade: **Gemini Grounded → Brave → DDG**
@@ -70,10 +112,10 @@ Gemini Grounded Search gives you Google-quality results through the Gemini API.
 ```
 Query → Gemini Grounded + DDG (parallel)
-  → Merge & deduplicate (12-16 results)
+  → Merge & deduplicate (12-19 results)
   → Source quality ranking (boost GitHub/SO/Reddit, penalize SEO spam)
   → Parallel scraping (Jina → readability → Playwright fallback)
-  → AI summarization with [1] [2] citations
+  → Returns sources to your agent (AI summary opt-in with summarize: true)
 ```
 ### What you get without any keys
@@ -92,56 +134,55 @@ Stealth browsing with anti-detection. Three tiers (auto-detected):
 const page = await web.browse('https://example.com')
 console.log(page.content)       // extracted text/markdown
 console.log(page.screenshot)    // PNG buffer (if requested)
-// With screenshot
-const page = await web.browse('https://example.com', { screenshot: true })
 ```
 Auto-fallback: if Jina and readability return too little content (<200 chars), Spectrawl renders the page with Playwright and extracts from the rendered DOM. Tavily can't do this — they fail on JS-heavy pages.
 ## Auth
-Persistent cookie storage (SQLite), multi-account management, automatic refresh.
+Persistent cookie storage (SQLite), multi-account management, automatic expiry detection.
 ```js
-// Store cookies
-await web.auth.setCookies('x', '@myhandle', cookies)
+// Add account
+await web.auth.add('x', { account: '@myhandle', method: 'cookie', cookies })
 // Check health
-const accounts = await web.status()
+const accounts = await web.auth.getStatus()
 // [{ platform: 'x', account: '@myhandle', status: 'valid', expiresAt: '...' }]
 ```
+Cookie refresh cron fires `cookie_expiring` and `cookie_expired` events before accounts go stale.
 ## Act — 24 Platform Adapters
-Post to 30+ platforms with one API:
+Post to 24+ platforms with one API:
 ```js
-await web.act('x', 'post', { text: 'Hello from Spectrawl', account: '@myhandle' })
+await web.act('github', 'create-issue', { repo: 'user/repo', title: 'Bug report', body: '...' })
 await web.act('reddit', 'post', { subreddit: 'node', title: '...', text: '...' })
-await web.act('github', 'create-repo', { name: 'my-repo', description: '...' })
+await web.act('devto', 'post', { title: '...', body: '...', tags: ['ai'] })
+await web.act('huggingface', 'create-repo', { name: 'my-model', type: 'model' })
 ```
+**Live tested:** GitHub ✅, Reddit ✅, Dev.to ✅, HuggingFace ✅, X (reads) ✅
 | Platform | Auth Method | Actions |
 |----------|-------------|---------|
-| X/Twitter | GraphQL Cookie + OAuth 1.0a | post |
-| Reddit | Cookie API (oauth.reddit.com) | post, comment |
-| Dev.to | REST API | post |
+| X/Twitter | Cookie + OAuth 1.0a | post |
+| Reddit | Cookie API | post, comment, delete |
+| Dev.to | REST API key | post, update |
 | Hashnode | GraphQL API | post |
 | LinkedIn | Cookie API (Voyager) | post |
-| IndieHackers | Browser automation | post, comment, upvote |
-| Medium | REST API | post (markdown) |
+| IndieHackers | Browser automation | post, comment |
+| Medium | REST API | post |
 | GitHub | REST v3 | repo, file, issue, release |
-| Discord | Bot API + webhooks | send, thread |
-| Product Hunt | GraphQL v2 | launch, comment, upvote |
-| Hacker News | Cookie/form POST | submit, comment, upvote |
-| YouTube | Data API v3 | comment, playlist |
-| Quora | Browser automation | answer, question |
+| Discord | Bot API | send, thread |
+| Product Hunt | GraphQL v2 | launch, comment |
+| Hacker News | Cookie API | submit, comment |
+| YouTube | Data API v3 | comment |
+| Quora | Browser automation | answer |
 | HuggingFace | Hub API | repo, model card, upload |
 | BetaList | REST API | submit |
-| AlternativeTo | Browser automation | submit |
-| SaaSHub | Browser automation | submit |
-| DevHunt | Browser automation | submit |
 | **14 Directories** | Generic adapter | submit |
 Built-in rate limiting, content dedup (MD5, 24h window), and dead letter queue for retries.
@@ -156,11 +197,9 @@ Spectrawl ranks results by domain trust — something Tavily doesn't do:
 ```js
 const web = new Spectrawl({
-  search: {
-    sourceRanker: {
-      boost: ['github.com', 'news.ycombinator.com'],
-      block: ['spamsite.com']
-    }
+  sourceRanker: {
+    boost: ['github.com', 'news.ycombinator.com'],
+    block: ['spamsite.com']
   }
 })
 ```
@@ -174,14 +213,14 @@ npx spectrawl serve --port 3900
 ```
 POST /search   { "query": "...", "summarize": true }
 POST /browse   { "url": "...", "screenshot": true }
-POST /act      { "platform": "x", "action": "post", "params": { ... } }
-GET  /status
-GET  /health
+POST /act      { "platform": "github", "action": "create-issue", ... }
+GET  /status   — auth account health
+GET  /health   — server health
 ```
 ## MCP Server
-Works with any MCP-compatible agent framework (Claude, OpenAI, etc.):
+Works with any MCP-compatible agent (Claude, Cursor, OpenClaw, LangChain):
 ```bash
 npx spectrawl mcp
@@ -208,19 +247,12 @@ npx spectrawl install-stealth   # download Camoufox browser
 {
   "search": {
     "cascade": ["gemini-grounded", "brave", "ddg"],
-    "scrapeTop": 3
+    "scrapeTop": 5
   },
   "cache": {
     "searchTtl": 3600,
     "scrapeTtl": 86400
   },
-  "proxy": {
-    "localPort": 8080,
-    "strategy": "round-robin",
-    "upstreams": [
-      { "url": "http://user:pass@proxy.example.com:8080" }
-    ]
-  },
   "rateLimit": {
     "x": { "postsPerHour": 3 },
     "reddit": { "postsPerHour": 5 }
@@ -231,14 +263,14 @@ npx spectrawl install-stealth   # download Camoufox browser
 ## Environment Variables
 ```
-GEMINI_API_KEY      Gemini API key (free — primary search + summarization)
-BRAVE_API_KEY       Brave Search API key (2,000 free/month)
-SERPER_API_KEY      Serper.dev API key
-GOOGLE_CSE_KEY      Google Custom Search API key
-GOOGLE_CSE_CX       Google Custom Search engine ID
-JINA_API_KEY        Jina Reader API key (optional)
-OPENAI_API_KEY      For LLM summarization (alternative to Gemini)
-ANTHROPIC_API_KEY   For LLM summarization (alternative to Gemini)
+GEMINI_API_KEY      Free — primary search + summarization (aistudio.google.com)
+BRAVE_API_KEY       Brave Search (2,000 free/month)
+SERPER_API_KEY      Serper.dev (2,500 trial queries)
+GITHUB_TOKEN        For GitHub adapter
+DEVTO_API_KEY       For Dev.to adapter
+HF_TOKEN            For HuggingFace adapter
+OPENAI_API_KEY      Alternative LLM for summarization
+ANTHROPIC_API_KEY   Alternative LLM for summarization
 ```
 ## License

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "spectrawl",
-  "version": "0.3.10",
+  "version": "0.3.12",
   "description": "The unified web layer for AI agents. Search (6 engines), stealth browse (Camoufox + Playwright), auth (cookies, multi-account), act (24 adapters, 30+ platforms), proxy rotation. Self-hosted, free.",
   "main": "src/index.js",
   "types": "index.d.ts",

package/src/act/adapters/devto.js CHANGED Viewed

@@ -82,16 +82,31 @@ function jsonRequest(method, url, body, headers) {
     const urlObj = new URL(url)
     const opts = {
       hostname: urlObj.hostname,
-      path: urlObj.pathname,
+      path: urlObj.pathname + urlObj.search,
       method,
-      headers: { ...headers, 'Content-Length': Buffer.byteLength(body) }
+      headers: {
+        ...headers,
+        'Content-Length': Buffer.byteLength(body),
+        'User-Agent': 'Spectrawl/0.3',
+        'Accept': 'application/json'
+      }
     }
     const req = https.request(opts, res => {
+      // Handle redirects
+      if ([301, 302, 307, 308].includes(res.statusCode) && res.headers.location) {
+        return jsonRequest(method, res.headers.location, body, headers).then(resolve).catch(reject)
+      }
       let data = ''
       res.on('data', c => data += c)
       res.on('end', () => {
+        if (!data && (res.statusCode >= 200 && res.statusCode < 300)) {
+          return resolve({ success: true, statusCode: res.statusCode })
+        }
+        if (res.statusCode >= 400) {
+          return reject(new Error(`Dev.to API ${res.statusCode}: ${data.slice(0, 200)}`))
+        }
         try { resolve(JSON.parse(data)) }
-        catch (e) { reject(new Error(`Invalid Dev.to response: ${data.slice(0, 200)}`)) }
+        catch (e) { reject(new Error(`Invalid Dev.to response (${res.statusCode}): ${data.slice(0, 200)}`)) }
       })
     })
     req.on('error', reject)

package/src/search/index.js CHANGED Viewed

@@ -29,7 +29,7 @@ class SearchEngine {
     this.config = config
     this.cache = cache
     this.cascade = config.cascade || ['ddg', 'brave', 'serper']
-    this.scrapeTop = config.scrapeTop || 3
+    this.scrapeTop = config.scrapeTop || 5
     this.summarizer = config.llm ? new Summarizer(config.llm) : null
     // Gemini-powered features (free tier)
@@ -188,16 +188,18 @@ class SearchEngine {
       }
     }
-    // Step 6: Summarize with citations
+    // Step 6: Summarize with citations (opt-in — most agents have their own LLM)
     let answer = null
-    const summarizer = this.summarizer || (this.reranker ? new Summarizer({
-      provider: 'gemini',
-      model: 'gemini-2.5-flash',
-      apiKey: process.env.GEMINI_API_KEY
-    }) : null)
+    if (opts.summarize === true) {
+      const summarizer = this.summarizer || (process.env.GEMINI_API_KEY ? new Summarizer({
+        provider: 'gemini',
+        model: 'gemini-2.5-flash',
+        apiKey: process.env.GEMINI_API_KEY
+      }) : null)
-    if (summarizer) {
-      answer = await summarizer.summarize(query, results)
+      if (summarizer) {
+        answer = await summarizer.summarize(query, results)
+      }
     }
     const response = {

package/src/search/summarizer.js CHANGED Viewed

@@ -39,8 +39,8 @@ class Summarizer {
     if (!this.apiKey) return null
     const context = sources
-      .slice(0, 5)
-      .map((s, i) => `[${i + 1}] ${s.title}\n${s.url}\n${(s.fullContent || s.snippet || '').slice(0, 1000)}`)
+      .slice(0, 8)
+      .map((s, i) => `[${i + 1}] ${s.title}\n${s.url}\n${(s.fullContent || s.snippet || '').slice(0, 1500)}`)
       .join('\n\n')
     const prompt = `Answer this question directly: "${query}"