spectrawl 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 FayAndXan
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,179 @@
1
+ # Spectrawl
2
+
3
+ The unified web layer for AI agents. Search, browse, authenticate, and act on platforms — one tool, self-hosted, free.
4
+
5
+ ## What It Does
6
+
7
+ AI agents need to interact with the web. That means searching, browsing pages, logging into platforms, and posting content. Today you duct-tape together Playwright + Tavily + cookie managers + platform-specific scripts. Spectrawl replaces all of that.
8
+
9
+ ```
10
+ npm install spectrawl
11
+ ```
12
+
13
+ **Search** — 6 engines in a cascade: SearXNG → DuckDuckGo → Brave → Serper → Google CSE → Jina. Tries free/unlimited first, falls through to quota-based. Dual scraping (Jina Reader + readability). Optional LLM summarization.
14
+
15
+ **Browse** — Stealth browsing with anti-detection out of the box. Three tiers:
16
+ 1. `playwright-extra` + stealth plugin (default, works immediately)
17
+ 2. Camoufox binary — engine-level anti-fingerprint (`npx spectrawl install-stealth`)
18
+ 3. Remote Camoufox service (for existing deployments)
19
+
20
+ **Auth** — Persistent cookie storage (SQLite), multi-account management, automatic cookie refresh, expiry alerts.
21
+
22
+ **Act** — Post to X, Reddit, Dev.to, Hashnode, LinkedIn, IndieHackers. Rate limiting, content dedup, dead letter queue for retries.
23
+
24
+ ## Quick Start
25
+
26
+ ```bash
27
+ npm install spectrawl
28
+ npx spectrawl init # create spectrawl.json config
29
+ npx spectrawl search "your query"
30
+ ```
31
+
32
+ ### As a Library
33
+
34
+ ```js
35
+ const { Spectrawl } = require('spectrawl')
36
+ const web = new Spectrawl()
37
+
38
+ // Search
39
+ const results = await web.search('best practices for node.js APIs')
40
+ console.log(results.sources) // [{ url, title, snippet, content }]
41
+ console.log(results.answer) // LLM summary (if configured)
42
+
43
+ // Browse with stealth
44
+ const page = await web.browse('https://example.com')
45
+ console.log(page.content) // extracted text
46
+ console.log(page.engine) // 'stealth-playwright' or 'camoufox'
47
+
48
+ // Act on platforms
49
+ await web.act('x', 'post', {
50
+ text: 'Hello from Spectrawl',
51
+ account: '@myhandle'
52
+ })
53
+
54
+ // Check auth health
55
+ const accounts = await web.status()
56
+ // [{ platform: 'x', account: '@myhandle', status: 'valid', expiresAt: '...' }]
57
+ ```
58
+
59
+ ### HTTP Server
60
+
61
+ ```bash
62
+ npx spectrawl serve --port 3900
63
+ ```
64
+
65
+ ```
66
+ POST /search { "query": "...", "summarize": true }
67
+ POST /browse { "url": "...", "screenshot": true }
68
+ POST /act { "platform": "x", "action": "post", "params": { "text": "..." } }
69
+ GET /status
70
+ GET /health
71
+ ```
72
+
73
+ ### MCP Server
74
+
75
+ Works with any MCP-compatible agent framework:
76
+
77
+ ```bash
78
+ npx spectrawl mcp
79
+ ```
80
+
81
+ Exposes 5 tools: `web_search`, `web_browse`, `web_act`, `web_auth`, `web_status`.
82
+
83
+ ## Stealth Browsing
84
+
85
+ Default: `playwright-extra` with stealth plugin patches webdriver detection, navigator properties, canvas/WebGL fingerprinting, and plugin enumeration. Works for ~90% of sites.
86
+
87
+ For deeper anti-detection:
88
+
89
+ ```bash
90
+ npx spectrawl install-stealth
91
+ ```
92
+
93
+ Downloads the [Camoufox](https://github.com/daijro/camoufox) browser — a patched Firefox with engine-level anti-fingerprint. Spectrawl auto-detects and uses it.
94
+
95
+ ## Search Engines
96
+
97
+ | Engine | Free Tier | Default |
98
+ |--------|-----------|---------|
99
+ | SearXNG | Unlimited (self-hosted) | ✅ |
100
+ | DuckDuckGo | Unlimited | ✅ |
101
+ | Brave | 2,000/month | ✅ |
102
+ | Serper | 2,500/month | Fallback |
103
+ | Google CSE | 100/day | Fallback |
104
+ | Jina Reader | Unlimited | Fallback |
105
+
106
+ Configure the cascade in `spectrawl.json`:
107
+
108
+ ```json
109
+ {
110
+ "search": {
111
+ "cascade": ["searxng", "ddg", "brave", "serper", "google-cse", "jina"]
112
+ }
113
+ }
114
+ ```
115
+
116
+ ## Platform Adapters
117
+
118
+ | Platform | Auth Method | Actions |
119
+ |----------|-------------|---------|
120
+ | X/Twitter | GraphQL Cookie + OAuth 1.0a | post |
121
+ | Reddit | Cookie API (oauth.reddit.com) | post, comment |
122
+ | Dev.to | REST API (API key) | post |
123
+ | Hashnode | GraphQL API | post |
124
+ | LinkedIn | Cookie API (Voyager) | post |
125
+ | IndieHackers | Browser automation | post, comment, upvote |
126
+
127
+ ## Configuration
128
+
129
+ `spectrawl.json`:
130
+
131
+ ```json
132
+ {
133
+ "port": 3900,
134
+ "search": {
135
+ "cascade": ["ddg", "brave"],
136
+ "scrapeTop": 3
137
+ },
138
+ "cache": {
139
+ "path": "./data/cache.db",
140
+ "searchTtl": 1,
141
+ "scrapeTtl": 24
142
+ },
143
+ "proxy": {
144
+ "host": "proxy.example.com",
145
+ "port": "8080",
146
+ "username": "user",
147
+ "password": "pass"
148
+ },
149
+ "camoufox": {
150
+ "url": "http://localhost:9869"
151
+ },
152
+ "rateLimit": {
153
+ "x": { "postsPerHour": 3, "minDelayMs": 60000 },
154
+ "reddit": { "postsPerHour": 5 }
155
+ }
156
+ }
157
+ ```
158
+
159
+ ## Environment Variables
160
+
161
+ ```
162
+ BRAVE_API_KEY Brave Search API key
163
+ SERPER_API_KEY Serper.dev API key
164
+ GOOGLE_CSE_KEY Google Custom Search API key
165
+ GOOGLE_CSE_CX Google Custom Search engine ID
166
+ JINA_API_KEY Jina Reader API key (optional)
167
+ SEARXNG_URL SearXNG instance URL (default: http://localhost:8888)
168
+ CAMOUFOX_URL Remote Camoufox service URL
169
+ OPENAI_API_KEY For LLM summarization
170
+ ANTHROPIC_API_KEY For LLM summarization
171
+ ```
172
+
173
+ ## License
174
+
175
+ MIT
176
+
177
+ ## Part of xanOS
178
+
179
+ Spectrawl is the web layer for [xanOS](https://github.com/FayAndXan/xanOS) — the autonomous content engine. Use it standalone or as part of the full stack.
package/index.d.ts ADDED
@@ -0,0 +1,90 @@
1
+ declare module 'spectrawl' {
2
+ export interface SearchResult {
3
+ sources: Array<{
4
+ url: string
5
+ title: string
6
+ snippet?: string
7
+ content?: string
8
+ }>
9
+ answer?: string
10
+ engine: string
11
+ cached: boolean
12
+ }
13
+
14
+ export interface BrowseResult {
15
+ content?: string
16
+ html?: string
17
+ screenshot?: Buffer
18
+ url: string
19
+ title: string
20
+ engine: 'stealth-playwright' | 'camoufox' | 'remote-camoufox' | 'playwright'
21
+ cached: boolean
22
+ cookies?: any[]
23
+ }
24
+
25
+ export interface ActResult {
26
+ success: boolean
27
+ error?: string
28
+ detail?: string
29
+ suggestion?: string
30
+ retryAfter?: number
31
+ url?: string
32
+ [key: string]: any
33
+ }
34
+
35
+ export interface AccountStatus {
36
+ platform: string
37
+ account: string
38
+ status: 'valid' | 'expiring' | 'expired' | 'unknown'
39
+ expiresAt?: string
40
+ cookieCount?: number
41
+ }
42
+
43
+ export interface SearchOptions {
44
+ summarize?: boolean
45
+ minResults?: number
46
+ noCache?: boolean
47
+ }
48
+
49
+ export interface BrowseOptions {
50
+ screenshot?: boolean
51
+ fullPage?: boolean
52
+ html?: boolean
53
+ extract?: boolean
54
+ stealth?: boolean
55
+ camoufox?: boolean
56
+ noCache?: boolean
57
+ saveCookies?: boolean
58
+ _cookies?: any[]
59
+ }
60
+
61
+ export interface ActParams {
62
+ text?: string
63
+ title?: string
64
+ body?: string
65
+ account?: string
66
+ group?: string
67
+ postUrl?: string
68
+ tweetId?: string
69
+ mediaIds?: string[]
70
+ _cookies?: any[]
71
+ [key: string]: any
72
+ }
73
+
74
+ export class Spectrawl {
75
+ constructor(config?: any)
76
+ search(query: string, opts?: SearchOptions): Promise<SearchResult>
77
+ browse(url: string, opts?: BrowseOptions): Promise<BrowseResult>
78
+ act(platform: string, action: string, params?: ActParams): Promise<ActResult>
79
+ status(): Promise<AccountStatus[]>
80
+ close(): Promise<void>
81
+ }
82
+
83
+ export function installStealth(): Promise<{
84
+ path: string
85
+ binary?: string
86
+ version: string
87
+ }>
88
+
89
+ export function isStealthInstalled(): boolean
90
+ }
package/package.json ADDED
@@ -0,0 +1,53 @@
1
+ {
2
+ "name": "spectrawl",
3
+ "version": "0.1.0",
4
+ "description": "The unified web layer for AI agents. Search, browse, authenticate, act — one tool, self-hosted, free.",
5
+ "main": "src/index.js",
6
+ "types": "index.d.ts",
7
+ "bin": {
8
+ "spectrawl": "./src/cli.js"
9
+ },
10
+ "files": [
11
+ "src/",
12
+ "index.d.ts",
13
+ "README.md",
14
+ "LICENSE"
15
+ ],
16
+ "scripts": {
17
+ "dev": "node src/server.js",
18
+ "start": "node src/server.js",
19
+ "test": "node --test test/*.test.js"
20
+ },
21
+ "keywords": [
22
+ "ai-agent",
23
+ "web-scraping",
24
+ "browser-automation",
25
+ "search-api",
26
+ "mcp",
27
+ "stealth-browser",
28
+ "cookie-management",
29
+ "tavily-alternative",
30
+ "anti-detect",
31
+ "camoufox",
32
+ "playwright"
33
+ ],
34
+ "author": "FayAndXan",
35
+ "license": "MIT",
36
+ "repository": {
37
+ "type": "git",
38
+ "url": "https://github.com/FayAndXan/spectrawl"
39
+ },
40
+ "homepage": "https://github.com/FayAndXan/spectrawl#readme",
41
+ "bugs": {
42
+ "url": "https://github.com/FayAndXan/spectrawl/issues"
43
+ },
44
+ "engines": {
45
+ "node": ">=20.0.0"
46
+ },
47
+ "dependencies": {
48
+ "better-sqlite3": "^11.0.0",
49
+ "playwright": "^1.50.0",
50
+ "playwright-extra": "^4.3.6",
51
+ "puppeteer-extra-plugin-stealth": "^2.11.2"
52
+ }
53
+ }
@@ -0,0 +1,103 @@
1
+ const https = require('https')
2
+
3
+ /**
4
+ * Dev.to platform adapter.
5
+ * Uses official REST API — simple, no cookies needed.
6
+ */
7
+ class DevtoAdapter {
8
+ async execute(action, params, ctx) {
9
+ switch (action) {
10
+ case 'post':
11
+ return this._post(params, ctx)
12
+ case 'update':
13
+ return this._update(params, ctx)
14
+ default:
15
+ throw new Error(`Unsupported Dev.to action: ${action}`)
16
+ }
17
+ }
18
+
19
+ async _post(params, ctx) {
20
+ const { title, body, tags, published, account } = params
21
+
22
+ const apiKey = await this._getApiKey(account, ctx)
23
+
24
+ const article = {
25
+ article: {
26
+ title,
27
+ body_markdown: body,
28
+ published: published !== false,
29
+ tags: tags || []
30
+ }
31
+ }
32
+
33
+ const data = await postJson('https://dev.to/api/articles', JSON.stringify(article), {
34
+ 'api-key': apiKey,
35
+ 'Content-Type': 'application/json'
36
+ })
37
+
38
+ return { articleId: data.id, url: data.url, slug: data.slug }
39
+ }
40
+
41
+ async _update(params, ctx) {
42
+ const { articleId, title, body, tags, published, account } = params
43
+
44
+ const apiKey = await this._getApiKey(account, ctx)
45
+
46
+ const article = { article: {} }
47
+ if (title) article.article.title = title
48
+ if (body) article.article.body_markdown = body
49
+ if (tags) article.article.tags = tags
50
+ if (published !== undefined) article.article.published = published
51
+
52
+ const data = await putJson(`https://dev.to/api/articles/${articleId}`, JSON.stringify(article), {
53
+ 'api-key': apiKey,
54
+ 'Content-Type': 'application/json'
55
+ })
56
+
57
+ return { articleId: data.id, url: data.url }
58
+ }
59
+
60
+ async _getApiKey(account, ctx) {
61
+ // Try to get API key from auth store
62
+ const creds = await ctx.auth.getCookies('devto', account)
63
+ if (creds?.apiKey) return creds.apiKey
64
+
65
+ // Try env
66
+ if (process.env.DEVTO_API_KEY) return process.env.DEVTO_API_KEY
67
+
68
+ throw new Error('Dev.to API key not configured. Run: spectrawl login devto --api-key YOUR_KEY')
69
+ }
70
+ }
71
+
72
+ function postJson(url, body, headers) {
73
+ return jsonRequest('POST', url, body, headers)
74
+ }
75
+
76
+ function putJson(url, body, headers) {
77
+ return jsonRequest('PUT', url, body, headers)
78
+ }
79
+
80
+ function jsonRequest(method, url, body, headers) {
81
+ return new Promise((resolve, reject) => {
82
+ const urlObj = new URL(url)
83
+ const opts = {
84
+ hostname: urlObj.hostname,
85
+ path: urlObj.pathname,
86
+ method,
87
+ headers: { ...headers, 'Content-Length': Buffer.byteLength(body) }
88
+ }
89
+ const req = https.request(opts, res => {
90
+ let data = ''
91
+ res.on('data', c => data += c)
92
+ res.on('end', () => {
93
+ try { resolve(JSON.parse(data)) }
94
+ catch (e) { reject(new Error(`Invalid Dev.to response: ${data.slice(0, 200)}`)) }
95
+ })
96
+ })
97
+ req.on('error', reject)
98
+ req.write(body)
99
+ req.end()
100
+ })
101
+ }
102
+
103
+ module.exports = { DevtoAdapter }
@@ -0,0 +1,89 @@
1
+ const https = require('https')
2
+
3
+ /**
4
+ * Hashnode platform adapter.
5
+ * Uses official GraphQL API.
6
+ */
7
+ class HashnodeAdapter {
8
+ async execute(action, params, ctx) {
9
+ switch (action) {
10
+ case 'post':
11
+ return this._post(params, ctx)
12
+ default:
13
+ throw new Error(`Unsupported Hashnode action: ${action}`)
14
+ }
15
+ }
16
+
17
+ async _post(params, ctx) {
18
+ const { title, body, tags, publicationId, account } = params
19
+ const apiKey = await this._getApiKey(account, ctx)
20
+
21
+ const query = `
22
+ mutation PublishPost($input: PublishPostInput!) {
23
+ publishPost(input: $input) {
24
+ post { id, slug, url, title }
25
+ }
26
+ }
27
+ `
28
+
29
+ const variables = {
30
+ input: {
31
+ title,
32
+ contentMarkdown: body,
33
+ publicationId: publicationId || await this._getPublicationId(apiKey),
34
+ tags: (tags || []).map(t => ({ slug: t.toLowerCase().replace(/\s+/g, '-'), name: t }))
35
+ }
36
+ }
37
+
38
+ const data = await graphql(apiKey, query, variables)
39
+
40
+ if (data.errors) {
41
+ throw new Error(`Hashnode error: ${data.errors[0]?.message}`)
42
+ }
43
+
44
+ const post = data.data?.publishPost?.post
45
+ return { postId: post?.id, url: post?.url, slug: post?.slug }
46
+ }
47
+
48
+ async _getPublicationId(apiKey) {
49
+ const query = `query { me { publications(first: 1) { edges { node { id } } } } }`
50
+ const data = await graphql(apiKey, query)
51
+ return data.data?.me?.publications?.edges?.[0]?.node?.id
52
+ }
53
+
54
+ async _getApiKey(account, ctx) {
55
+ const creds = await ctx.auth.getCookies('hashnode', account)
56
+ if (creds?.apiKey) return creds.apiKey
57
+ if (process.env.HASHNODE_API_KEY) return process.env.HASHNODE_API_KEY
58
+ throw new Error('Hashnode API key not configured')
59
+ }
60
+ }
61
+
62
+ function graphql(apiKey, query, variables = {}) {
63
+ return new Promise((resolve, reject) => {
64
+ const body = JSON.stringify({ query, variables })
65
+ const opts = {
66
+ hostname: 'gql.hashnode.com',
67
+ path: '/',
68
+ method: 'POST',
69
+ headers: {
70
+ 'Authorization': apiKey,
71
+ 'Content-Type': 'application/json',
72
+ 'Content-Length': Buffer.byteLength(body)
73
+ }
74
+ }
75
+ const req = https.request(opts, res => {
76
+ let data = ''
77
+ res.on('data', c => data += c)
78
+ res.on('end', () => {
79
+ try { resolve(JSON.parse(data)) }
80
+ catch (e) { reject(new Error('Invalid Hashnode response')) }
81
+ })
82
+ })
83
+ req.on('error', reject)
84
+ req.write(body)
85
+ req.end()
86
+ })
87
+ }
88
+
89
+ module.exports = { HashnodeAdapter }