spectrawl 0.3.7 → 0.3.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +146 -96
- package/index.d.ts +87 -69
- package/package.json +1 -1
- package/src/search/summarizer.js +13 -1
package/README.md
CHANGED
|
@@ -2,6 +2,8 @@
|
|
|
2
2
|
|
|
3
3
|
The unified web layer for AI agents. Search, browse, authenticate, and act on platforms — one tool, self-hosted, free.
|
|
4
4
|
|
|
5
|
+
**Free Tavily alternative** with Google-quality results via Gemini Grounded Search.
|
|
6
|
+
|
|
5
7
|
## What It Does
|
|
6
8
|
|
|
7
9
|
AI agents need to interact with the web. That means searching, browsing pages, logging into platforms, and posting content. Today you duct-tape together Playwright + Tavily + cookie managers + platform-specific scripts. Spectrawl replaces all of that.
|
|
@@ -10,123 +12,121 @@ AI agents need to interact with the web. That means searching, browsing pages, l
|
|
|
10
12
|
npm install spectrawl
|
|
11
13
|
```
|
|
12
14
|
|
|
13
|
-
**Search** — 6 engines in a cascade: SearXNG → DuckDuckGo → Brave → Serper → Google CSE → Jina. Tries free/unlimited first, falls through to quota-based. Dual scraping (Jina Reader + readability). Optional LLM summarization.
|
|
14
|
-
|
|
15
|
-
**Browse** — Stealth browsing with anti-detection out of the box. Three tiers:
|
|
16
|
-
1. `playwright-extra` + stealth plugin (default, works immediately)
|
|
17
|
-
2. Camoufox binary — engine-level anti-fingerprint (`npx spectrawl install-stealth`)
|
|
18
|
-
3. Remote Camoufox service (for existing deployments)
|
|
19
|
-
|
|
20
|
-
**Auth** — Persistent cookie storage (SQLite), multi-account management, automatic cookie refresh, expiry alerts.
|
|
21
|
-
|
|
22
|
-
**Act** — 24 platform adapters covering 30+ sites:
|
|
23
|
-
- **Content platforms:** X, Reddit, LinkedIn, Dev.to, Hashnode, IndieHackers, Medium, Hacker News, Quora
|
|
24
|
-
- **Developer:** GitHub (repos, issues, releases), HuggingFace (models, datasets), Discord (bot + webhooks)
|
|
25
|
-
- **Launch/SEO:** Product Hunt, BetaList, AlternativeTo, SaaSHub, DevHunt, AppSumo
|
|
26
|
-
- **Directories:** Generic adapter for MicroLaunch, Uneed, Peerlist, Fazier, BetaPage, LaunchingNext, StartupStash, SideProjectors, TAIFT, Futurepedia, Crunchbase, G2, StackShare, YouTube
|
|
27
|
-
- Rate limiting, content dedup, dead letter queue for retries.
|
|
28
|
-
|
|
29
|
-
**Proxy** — Rotating proxy server. One endpoint (`localhost:8080`) for all your tools. Round-robin, random, or least-used strategies. Health checking with auto-failover.
|
|
30
|
-
|
|
31
15
|
## Quick Start
|
|
32
16
|
|
|
33
17
|
```bash
|
|
34
18
|
npm install spectrawl
|
|
35
|
-
|
|
36
|
-
npx spectrawl search "your query"
|
|
19
|
+
export GEMINI_API_KEY=your-free-key # Get one at aistudio.google.com
|
|
37
20
|
```
|
|
38
21
|
|
|
39
|
-
### As a Library
|
|
40
|
-
|
|
41
22
|
```js
|
|
42
23
|
const { Spectrawl } = require('spectrawl')
|
|
43
24
|
const web = new Spectrawl()
|
|
44
25
|
|
|
45
|
-
//
|
|
46
|
-
const
|
|
47
|
-
console.log(
|
|
48
|
-
console.log(
|
|
26
|
+
// Deep search — like Tavily but free
|
|
27
|
+
const result = await web.deepSearch('best AI agent frameworks 2025')
|
|
28
|
+
console.log(result.answer) // AI-generated answer with citations
|
|
29
|
+
console.log(result.sources) // [{ title, url, content, score }]
|
|
49
30
|
|
|
50
|
-
//
|
|
51
|
-
const
|
|
52
|
-
console.log(page.content) // extracted text
|
|
53
|
-
console.log(page.engine) // 'stealth-playwright' or 'camoufox'
|
|
54
|
-
|
|
55
|
-
// Act on platforms
|
|
56
|
-
await web.act('x', 'post', {
|
|
57
|
-
text: 'Hello from Spectrawl',
|
|
58
|
-
account: '@myhandle'
|
|
59
|
-
})
|
|
31
|
+
// Fast mode — snippets only, ~6s
|
|
32
|
+
const fast = await web.deepSearch('query', { mode: 'fast' })
|
|
60
33
|
|
|
61
|
-
//
|
|
62
|
-
const
|
|
63
|
-
// [{ platform: 'x', account: '@myhandle', status: 'valid', expiresAt: '...' }]
|
|
34
|
+
// Basic search — raw results, no AI
|
|
35
|
+
const basic = await web.search('query')
|
|
64
36
|
```
|
|
65
37
|
|
|
66
|
-
###
|
|
38
|
+
### vs Tavily
|
|
67
39
|
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
40
|
+
| | Tavily | Spectrawl |
|
|
41
|
+
|---|---|---|
|
|
42
|
+
| Speed | ~2s | ~6-9s |
|
|
43
|
+
| Search quality | Google index | Google via Gemini ✅ |
|
|
44
|
+
| Results per query | 10 | 12-16 ✅ |
|
|
45
|
+
| Citations | ✅ | ✅ |
|
|
46
|
+
| Cost | $0.01/query | **Free** ✅ |
|
|
47
|
+
| Self-hosted | No | **Yes** ✅ |
|
|
48
|
+
| Stealth scraping | No | **Yes** ✅ |
|
|
49
|
+
| Auth + posting | No | **24 adapters** ✅ |
|
|
50
|
+
| Cached repeats | No | **<1ms** ✅ |
|
|
71
51
|
|
|
72
|
-
|
|
73
|
-
POST /search { "query": "...", "summarize": true }
|
|
74
|
-
POST /browse { "url": "...", "screenshot": true }
|
|
75
|
-
POST /act { "platform": "x", "action": "post", "params": { "text": "..." } }
|
|
76
|
-
GET /status
|
|
77
|
-
GET /health
|
|
78
|
-
```
|
|
52
|
+
## Search
|
|
79
53
|
|
|
80
|
-
|
|
54
|
+
Default cascade: **Gemini Grounded → Brave → DDG**
|
|
81
55
|
|
|
82
|
-
|
|
56
|
+
Gemini Grounded Search gives you Google-quality results through the Gemini API. Free tier: 5,000 grounded queries/month.
|
|
83
57
|
|
|
84
|
-
|
|
85
|
-
|
|
58
|
+
| Engine | Free Tier | Key Required | Default |
|
|
59
|
+
|--------|-----------|-------------|---------|
|
|
60
|
+
| **Gemini Grounded** | 5,000/month | `GEMINI_API_KEY` | ✅ Primary |
|
|
61
|
+
| Brave | 2,000/month | `BRAVE_API_KEY` | ✅ Fallback |
|
|
62
|
+
| DuckDuckGo | Unlimited | None | ✅ Last resort |
|
|
63
|
+
| Bing | Unlimited | None | Available |
|
|
64
|
+
| Serper | 2,500 trial | `SERPER_API_KEY` | Available |
|
|
65
|
+
| Google CSE | 100/day | `GOOGLE_CSE_KEY` | Available |
|
|
66
|
+
| Jina Reader | Unlimited | None | Available |
|
|
67
|
+
| SearXNG | Unlimited | Self-hosted | Available |
|
|
68
|
+
|
|
69
|
+
### Deep Search Pipeline
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
Query → Gemini Grounded + DDG (parallel)
|
|
73
|
+
→ Merge & deduplicate (12-16 results)
|
|
74
|
+
→ Source quality ranking (boost GitHub/SO/Reddit, penalize SEO spam)
|
|
75
|
+
→ Parallel scraping (Jina → readability → Playwright fallback)
|
|
76
|
+
→ AI summarization with [1] [2] citations
|
|
86
77
|
```
|
|
87
78
|
|
|
88
|
-
|
|
79
|
+
### What you get without any keys
|
|
89
80
|
|
|
90
|
-
|
|
81
|
+
DDG-only search, raw results, no AI answer. Works from home IPs. Datacenter IPs get rate-limited by DDG — recommend at minimum a free Gemini key.
|
|
91
82
|
|
|
92
|
-
|
|
83
|
+
## Browse
|
|
93
84
|
|
|
94
|
-
|
|
85
|
+
Stealth browsing with anti-detection. Three tiers (auto-detected):
|
|
95
86
|
|
|
96
|
-
|
|
97
|
-
npx spectrawl install-stealth
|
|
87
|
+
1. **playwright-extra + stealth plugin** — default, works immediately
|
|
88
|
+
2. **Camoufox binary** — engine-level anti-fingerprint (`npx spectrawl install-stealth`)
|
|
89
|
+
3. **Remote Camoufox** — for existing deployments
|
|
90
|
+
|
|
91
|
+
```js
|
|
92
|
+
const page = await web.browse('https://example.com')
|
|
93
|
+
console.log(page.content) // extracted text/markdown
|
|
94
|
+
console.log(page.screenshot) // PNG buffer (if requested)
|
|
95
|
+
|
|
96
|
+
// With screenshot
|
|
97
|
+
const page = await web.browse('https://example.com', { screenshot: true })
|
|
98
98
|
```
|
|
99
99
|
|
|
100
|
-
|
|
100
|
+
Auto-fallback: if Jina and readability return too little content (<200 chars), Spectrawl renders the page with Playwright and extracts from the rendered DOM. Tavily can't do this — they fail on JS-heavy pages.
|
|
101
101
|
|
|
102
|
-
##
|
|
102
|
+
## Auth
|
|
103
103
|
|
|
104
|
-
|
|
105
|
-
|--------|-----------|---------|
|
|
106
|
-
| SearXNG | Unlimited (self-hosted) | ✅ |
|
|
107
|
-
| DuckDuckGo | Unlimited | ✅ |
|
|
108
|
-
| Brave | 2,000/month | ✅ |
|
|
109
|
-
| Serper | 2,500/month | Fallback |
|
|
110
|
-
| Google CSE | 100/day | Fallback |
|
|
111
|
-
| Jina Reader | Unlimited | Fallback |
|
|
104
|
+
Persistent cookie storage (SQLite), multi-account management, automatic refresh.
|
|
112
105
|
|
|
113
|
-
|
|
106
|
+
```js
|
|
107
|
+
// Store cookies
|
|
108
|
+
await web.auth.setCookies('x', '@myhandle', cookies)
|
|
114
109
|
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
"cascade": ["searxng", "ddg", "brave", "serper", "google-cse", "jina"]
|
|
119
|
-
}
|
|
120
|
-
}
|
|
110
|
+
// Check health
|
|
111
|
+
const accounts = await web.status()
|
|
112
|
+
// [{ platform: 'x', account: '@myhandle', status: 'valid', expiresAt: '...' }]
|
|
121
113
|
```
|
|
122
114
|
|
|
123
|
-
## Platform Adapters
|
|
115
|
+
## Act — 24 Platform Adapters
|
|
116
|
+
|
|
117
|
+
Post to 30+ platforms with one API:
|
|
118
|
+
|
|
119
|
+
```js
|
|
120
|
+
await web.act('x', 'post', { text: 'Hello from Spectrawl', account: '@myhandle' })
|
|
121
|
+
await web.act('reddit', 'post', { subreddit: 'node', title: '...', text: '...' })
|
|
122
|
+
await web.act('github', 'create-repo', { name: 'my-repo', description: '...' })
|
|
123
|
+
```
|
|
124
124
|
|
|
125
125
|
| Platform | Auth Method | Actions |
|
|
126
126
|
|----------|-------------|---------|
|
|
127
127
|
| X/Twitter | GraphQL Cookie + OAuth 1.0a | post |
|
|
128
128
|
| Reddit | Cookie API (oauth.reddit.com) | post, comment |
|
|
129
|
-
| Dev.to | REST API
|
|
129
|
+
| Dev.to | REST API | post |
|
|
130
130
|
| Hashnode | GraphQL API | post |
|
|
131
131
|
| LinkedIn | Cookie API (Voyager) | post |
|
|
132
132
|
| IndieHackers | Browser automation | post, comment, upvote |
|
|
@@ -135,14 +135,70 @@ Configure the cascade in `spectrawl.json`:
|
|
|
135
135
|
| Discord | Bot API + webhooks | send, thread |
|
|
136
136
|
| Product Hunt | GraphQL v2 | launch, comment, upvote |
|
|
137
137
|
| Hacker News | Cookie/form POST | submit, comment, upvote |
|
|
138
|
-
| YouTube | Data API v3 | comment, playlist
|
|
138
|
+
| YouTube | Data API v3 | comment, playlist |
|
|
139
139
|
| Quora | Browser automation | answer, question |
|
|
140
140
|
| HuggingFace | Hub API | repo, model card, upload |
|
|
141
141
|
| BetaList | REST API | submit |
|
|
142
142
|
| AlternativeTo | Browser automation | submit |
|
|
143
143
|
| SaaSHub | Browser automation | submit |
|
|
144
144
|
| DevHunt | Browser automation | submit |
|
|
145
|
-
| **
|
|
145
|
+
| **14 Directories** | Generic adapter | submit |
|
|
146
|
+
|
|
147
|
+
Built-in rate limiting, content dedup (MD5, 24h window), and dead letter queue for retries.
|
|
148
|
+
|
|
149
|
+
## Source Quality Ranking
|
|
150
|
+
|
|
151
|
+
Spectrawl ranks results by domain trust — something Tavily doesn't do:
|
|
152
|
+
|
|
153
|
+
- **Boosted:** GitHub, StackOverflow, HN, Reddit, MDN, arxiv, Wikipedia
|
|
154
|
+
- **Penalized:** SEO farms, thin content sites, tag/category pages
|
|
155
|
+
- **Customizable:** bring your own domain weights
|
|
156
|
+
|
|
157
|
+
```js
|
|
158
|
+
const web = new Spectrawl({
|
|
159
|
+
search: {
|
|
160
|
+
sourceRanker: {
|
|
161
|
+
boost: ['github.com', 'news.ycombinator.com'],
|
|
162
|
+
block: ['spamsite.com']
|
|
163
|
+
}
|
|
164
|
+
}
|
|
165
|
+
})
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
## HTTP Server
|
|
169
|
+
|
|
170
|
+
```bash
|
|
171
|
+
npx spectrawl serve --port 3900
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
```
|
|
175
|
+
POST /search { "query": "...", "summarize": true }
|
|
176
|
+
POST /browse { "url": "...", "screenshot": true }
|
|
177
|
+
POST /act { "platform": "x", "action": "post", "params": { ... } }
|
|
178
|
+
GET /status
|
|
179
|
+
GET /health
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
## MCP Server
|
|
183
|
+
|
|
184
|
+
Works with any MCP-compatible agent framework (Claude, OpenAI, etc.):
|
|
185
|
+
|
|
186
|
+
```bash
|
|
187
|
+
npx spectrawl mcp
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
5 tools: `web_search`, `web_browse`, `web_act`, `web_auth`, `web_status`.
|
|
191
|
+
|
|
192
|
+
## CLI
|
|
193
|
+
|
|
194
|
+
```bash
|
|
195
|
+
npx spectrawl init # create spectrawl.json
|
|
196
|
+
npx spectrawl search "query" # search from terminal
|
|
197
|
+
npx spectrawl status # check auth health
|
|
198
|
+
npx spectrawl serve # start HTTP server
|
|
199
|
+
npx spectrawl mcp # start MCP server
|
|
200
|
+
npx spectrawl install-stealth # download Camoufox browser
|
|
201
|
+
```
|
|
146
202
|
|
|
147
203
|
## Configuration
|
|
148
204
|
|
|
@@ -150,28 +206,23 @@ Configure the cascade in `spectrawl.json`:
|
|
|
150
206
|
|
|
151
207
|
```json
|
|
152
208
|
{
|
|
153
|
-
"port": 3900,
|
|
154
209
|
"search": {
|
|
155
|
-
"cascade": ["
|
|
210
|
+
"cascade": ["gemini-grounded", "brave", "ddg"],
|
|
156
211
|
"scrapeTop": 3
|
|
157
212
|
},
|
|
158
213
|
"cache": {
|
|
159
|
-
"
|
|
160
|
-
"
|
|
161
|
-
"scrapeTtl": 24
|
|
214
|
+
"searchTtl": 3600,
|
|
215
|
+
"scrapeTtl": 86400
|
|
162
216
|
},
|
|
163
217
|
"proxy": {
|
|
164
218
|
"localPort": 8080,
|
|
165
219
|
"strategy": "round-robin",
|
|
166
220
|
"upstreams": [
|
|
167
|
-
{ "url": "http://user:pass@
|
|
221
|
+
{ "url": "http://user:pass@proxy.example.com:8080" }
|
|
168
222
|
]
|
|
169
223
|
},
|
|
170
|
-
"camoufox": {
|
|
171
|
-
"url": "http://localhost:9869"
|
|
172
|
-
},
|
|
173
224
|
"rateLimit": {
|
|
174
|
-
"x": { "postsPerHour": 3
|
|
225
|
+
"x": { "postsPerHour": 3 },
|
|
175
226
|
"reddit": { "postsPerHour": 5 }
|
|
176
227
|
}
|
|
177
228
|
}
|
|
@@ -180,15 +231,14 @@ Configure the cascade in `spectrawl.json`:
|
|
|
180
231
|
## Environment Variables
|
|
181
232
|
|
|
182
233
|
```
|
|
183
|
-
|
|
234
|
+
GEMINI_API_KEY Gemini API key (free — primary search + summarization)
|
|
235
|
+
BRAVE_API_KEY Brave Search API key (2,000 free/month)
|
|
184
236
|
SERPER_API_KEY Serper.dev API key
|
|
185
237
|
GOOGLE_CSE_KEY Google Custom Search API key
|
|
186
238
|
GOOGLE_CSE_CX Google Custom Search engine ID
|
|
187
239
|
JINA_API_KEY Jina Reader API key (optional)
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
OPENAI_API_KEY For LLM summarization
|
|
191
|
-
ANTHROPIC_API_KEY For LLM summarization
|
|
240
|
+
OPENAI_API_KEY For LLM summarization (alternative to Gemini)
|
|
241
|
+
ANTHROPIC_API_KEY For LLM summarization (alternative to Gemini)
|
|
192
242
|
```
|
|
193
243
|
|
|
194
244
|
## License
|
package/index.d.ts
CHANGED
|
@@ -1,90 +1,108 @@
|
|
|
1
1
|
declare module 'spectrawl' {
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
2
|
+
interface SpectrawlConfig {
|
|
3
|
+
search?: {
|
|
4
|
+
cascade?: string[]
|
|
5
|
+
scrapeTop?: number
|
|
6
|
+
geminiKey?: string
|
|
7
|
+
'gemini-grounded'?: { apiKey?: string; model?: string }
|
|
8
|
+
llm?: { provider: string; model?: string; apiKey?: string }
|
|
9
|
+
sourceRanker?: {
|
|
10
|
+
weights?: Record<string, number>
|
|
11
|
+
boost?: string[]
|
|
12
|
+
block?: string[]
|
|
13
|
+
}
|
|
14
|
+
}
|
|
15
|
+
browse?: {
|
|
16
|
+
defaultEngine?: string
|
|
17
|
+
proxy?: { type: string; host: string; port: number; username?: string; password?: string }
|
|
18
|
+
humanlike?: { minDelay?: number; maxDelay?: number; scrollBehavior?: boolean }
|
|
19
|
+
}
|
|
20
|
+
auth?: {
|
|
21
|
+
refreshInterval?: string
|
|
22
|
+
cookieStore?: string
|
|
23
|
+
}
|
|
24
|
+
cache?: {
|
|
25
|
+
path?: string
|
|
26
|
+
searchTtl?: number
|
|
27
|
+
scrapeTtl?: number
|
|
28
|
+
screenshotTtl?: number
|
|
29
|
+
}
|
|
30
|
+
rateLimit?: Record<string, { postsPerHour?: number; minDelayMs?: number }>
|
|
31
|
+
proxy?: {
|
|
32
|
+
localPort?: number
|
|
33
|
+
strategy?: 'round-robin' | 'random' | 'least-used'
|
|
34
|
+
upstreams?: { url: string }[]
|
|
35
|
+
}
|
|
12
36
|
}
|
|
13
37
|
|
|
14
|
-
|
|
15
|
-
content?: string
|
|
16
|
-
html?: string
|
|
17
|
-
screenshot?: Buffer
|
|
18
|
-
url: string
|
|
38
|
+
interface SearchResult {
|
|
19
39
|
title: string
|
|
20
|
-
|
|
40
|
+
url: string
|
|
41
|
+
snippet: string
|
|
42
|
+
content?: string
|
|
43
|
+
score?: number
|
|
44
|
+
engine?: string
|
|
45
|
+
}
|
|
46
|
+
|
|
47
|
+
interface SearchResponse {
|
|
48
|
+
answer: string | null
|
|
49
|
+
sources: SearchResult[]
|
|
21
50
|
cached: boolean
|
|
22
|
-
cookies?: any[]
|
|
23
51
|
}
|
|
24
52
|
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
retryAfter?: number
|
|
31
|
-
url?: string
|
|
32
|
-
[key: string]: any
|
|
53
|
+
interface DeepSearchResponse {
|
|
54
|
+
answer: string | null
|
|
55
|
+
sources: SearchResult[]
|
|
56
|
+
queries: string[]
|
|
57
|
+
cached: boolean
|
|
33
58
|
}
|
|
34
59
|
|
|
35
|
-
|
|
60
|
+
interface DeepSearchOptions {
|
|
61
|
+
mode?: 'fast' | 'full'
|
|
62
|
+
scrapeTop?: number
|
|
63
|
+
expand?: boolean
|
|
64
|
+
rerank?: boolean
|
|
65
|
+
}
|
|
66
|
+
|
|
67
|
+
interface BrowseResult {
|
|
68
|
+
content: string
|
|
69
|
+
text?: string
|
|
70
|
+
screenshot?: Buffer
|
|
71
|
+
engine: string
|
|
72
|
+
url: string
|
|
73
|
+
}
|
|
74
|
+
|
|
75
|
+
interface AuthStatus {
|
|
36
76
|
platform: string
|
|
37
77
|
account: string
|
|
38
|
-
status: 'valid' | '
|
|
78
|
+
status: 'valid' | 'expired' | 'unknown'
|
|
39
79
|
expiresAt?: string
|
|
40
|
-
cookieCount?: number
|
|
41
80
|
}
|
|
42
81
|
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
minResults?: number
|
|
46
|
-
noCache?: boolean
|
|
47
|
-
}
|
|
82
|
+
class Spectrawl {
|
|
83
|
+
constructor(config?: SpectrawlConfig | string)
|
|
48
84
|
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
fullPage?: boolean
|
|
52
|
-
html?: boolean
|
|
53
|
-
extract?: boolean
|
|
54
|
-
stealth?: boolean
|
|
55
|
-
camoufox?: boolean
|
|
56
|
-
noCache?: boolean
|
|
57
|
-
saveCookies?: boolean
|
|
58
|
-
_cookies?: any[]
|
|
59
|
-
}
|
|
85
|
+
/** Basic search — raw results from cascade engines */
|
|
86
|
+
search(query: string, opts?: { summarize?: boolean; scrapeTop?: number; engines?: string[] }): Promise<SearchResponse>
|
|
60
87
|
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
mediaIds?: string[]
|
|
70
|
-
_cookies?: any[]
|
|
71
|
-
[key: string]: any
|
|
72
|
-
}
|
|
88
|
+
/** Deep search — Tavily-equivalent with citations. Set GEMINI_API_KEY for best results. */
|
|
89
|
+
deepSearch(query: string, opts?: DeepSearchOptions): Promise<DeepSearchResponse>
|
|
90
|
+
|
|
91
|
+
/** Browse a URL with stealth anti-detection */
|
|
92
|
+
browse(url: string, opts?: { screenshot?: boolean; timeout?: number; extractText?: boolean }): Promise<BrowseResult>
|
|
93
|
+
|
|
94
|
+
/** Act on a platform (post, comment, submit) */
|
|
95
|
+
act(platform: string, action: string, params: Record<string, any>): Promise<any>
|
|
73
96
|
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
97
|
+
/** Check auth health for all configured accounts */
|
|
98
|
+
status(): Promise<AuthStatus[]>
|
|
99
|
+
|
|
100
|
+
/** Get raw Playwright page for custom automation */
|
|
101
|
+
getPage(url: string, opts?: any): Promise<any>
|
|
102
|
+
|
|
103
|
+
/** Close all connections */
|
|
80
104
|
close(): Promise<void>
|
|
81
105
|
}
|
|
82
106
|
|
|
83
|
-
export
|
|
84
|
-
path: string
|
|
85
|
-
binary?: string
|
|
86
|
-
version: string
|
|
87
|
-
}>
|
|
88
|
-
|
|
89
|
-
export function isStealthInstalled(): boolean
|
|
107
|
+
export { Spectrawl, SpectrawlConfig, SearchResult, SearchResponse, DeepSearchResponse, DeepSearchOptions, BrowseResult, AuthStatus }
|
|
90
108
|
}
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "spectrawl",
|
|
3
|
-
"version": "0.3.
|
|
3
|
+
"version": "0.3.9",
|
|
4
4
|
"description": "The unified web layer for AI agents. Search (6 engines), stealth browse (Camoufox + Playwright), auth (cookies, multi-account), act (24 adapters, 30+ platforms), proxy rotation. Self-hosted, free.",
|
|
5
5
|
"main": "src/index.js",
|
|
6
6
|
"types": "index.d.ts",
|
package/src/search/summarizer.js
CHANGED
|
@@ -7,11 +7,23 @@ const https = require('https')
|
|
|
7
7
|
class Summarizer {
|
|
8
8
|
constructor(config = {}) {
|
|
9
9
|
this.provider = config.provider || 'openai'
|
|
10
|
-
this.model = config.model ||
|
|
10
|
+
this.model = config.model || this._defaultModel()
|
|
11
11
|
this.apiKey = config.apiKey || process.env[this._envKey()]
|
|
12
12
|
this.baseUrl = config.baseUrl || null
|
|
13
13
|
}
|
|
14
14
|
|
|
15
|
+
_defaultModel() {
|
|
16
|
+
const defaults = {
|
|
17
|
+
openai: 'gpt-4o-mini',
|
|
18
|
+
anthropic: 'claude-3-5-haiku-20241022',
|
|
19
|
+
gemini: 'gemini-2.5-flash',
|
|
20
|
+
minimax: 'MiniMax-Text-01',
|
|
21
|
+
xai: 'grok-3-mini-fast',
|
|
22
|
+
ollama: 'llama3'
|
|
23
|
+
}
|
|
24
|
+
return defaults[this.provider] || 'gpt-4o-mini'
|
|
25
|
+
}
|
|
26
|
+
|
|
15
27
|
_envKey() {
|
|
16
28
|
const keys = {
|
|
17
29
|
openai: 'OPENAI_API_KEY',
|