webpeel 0.6.1 → 0.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +140 -500
- package/dist/cli-auth.d.ts +2 -0
- package/dist/cli-auth.d.ts.map +1 -1
- package/dist/cli-auth.js +16 -3
- package/dist/cli-auth.js.map +1 -1
- package/dist/cli.js +475 -77
- package/dist/cli.js.map +1 -1
- package/dist/core/actions.d.ts +19 -10
- package/dist/core/actions.d.ts.map +1 -1
- package/dist/core/actions.js +214 -43
- package/dist/core/actions.js.map +1 -1
- package/dist/core/agent.d.ts +60 -3
- package/dist/core/agent.d.ts.map +1 -1
- package/dist/core/agent.js +375 -86
- package/dist/core/agent.js.map +1 -1
- package/dist/core/answer.d.ts +43 -0
- package/dist/core/answer.d.ts.map +1 -0
- package/dist/core/answer.js +378 -0
- package/dist/core/answer.js.map +1 -0
- package/dist/core/cache.d.ts +14 -0
- package/dist/core/cache.d.ts.map +1 -0
- package/dist/core/cache.js +122 -0
- package/dist/core/cache.js.map +1 -0
- package/dist/core/dns-cache.d.ts +21 -0
- package/dist/core/dns-cache.d.ts.map +1 -0
- package/dist/core/dns-cache.js +184 -0
- package/dist/core/dns-cache.js.map +1 -0
- package/dist/core/documents.d.ts +24 -0
- package/dist/core/documents.d.ts.map +1 -0
- package/dist/core/documents.js +124 -0
- package/dist/core/documents.js.map +1 -0
- package/dist/core/extract-inline.d.ts +39 -0
- package/dist/core/extract-inline.d.ts.map +1 -0
- package/dist/core/extract-inline.js +214 -0
- package/dist/core/extract-inline.js.map +1 -0
- package/dist/core/fetcher.d.ts +33 -7
- package/dist/core/fetcher.d.ts.map +1 -1
- package/dist/core/fetcher.js +608 -41
- package/dist/core/fetcher.js.map +1 -1
- package/dist/core/jobs.d.ts +66 -0
- package/dist/core/jobs.d.ts.map +1 -0
- package/dist/core/jobs.js +513 -0
- package/dist/core/jobs.js.map +1 -0
- package/dist/core/markdown.d.ts.map +1 -1
- package/dist/core/markdown.js +141 -31
- package/dist/core/markdown.js.map +1 -1
- package/dist/core/pdf.d.ts.map +1 -1
- package/dist/core/pdf.js +3 -1
- package/dist/core/pdf.js.map +1 -1
- package/dist/core/screenshot.d.ts +33 -0
- package/dist/core/screenshot.d.ts.map +1 -0
- package/dist/core/screenshot.js +30 -0
- package/dist/core/screenshot.js.map +1 -0
- package/dist/core/search-provider.d.ts +46 -0
- package/dist/core/search-provider.d.ts.map +1 -0
- package/dist/core/search-provider.js +281 -0
- package/dist/core/search-provider.js.map +1 -0
- package/dist/core/strategies.d.ts +7 -10
- package/dist/core/strategies.d.ts.map +1 -1
- package/dist/core/strategies.js +370 -63
- package/dist/core/strategies.js.map +1 -1
- package/dist/index.d.ts +9 -3
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +61 -32
- package/dist/index.js.map +1 -1
- package/dist/mcp/server.js +335 -70
- package/dist/mcp/server.js.map +1 -1
- package/dist/types.d.ts +43 -1
- package/dist/types.d.ts.map +1 -1
- package/dist/types.js.map +1 -1
- package/llms.txt +85 -47
- package/package.json +11 -5
package/README.md
CHANGED
|
@@ -1,236 +1,125 @@
|
|
|
1
|
-
|
|
1
|
+
<p align="center">
|
|
2
|
+
<a href="https://webpeel.dev">
|
|
3
|
+
<img src=".github/banner.svg" alt="WebPeel — Web fetching for AI agents" width="100%">
|
|
4
|
+
</a>
|
|
5
|
+
</p>
|
|
6
|
+
|
|
7
|
+
<p align="center">
|
|
8
|
+
<a href="https://www.npmjs.com/package/webpeel"><img src="https://img.shields.io/npm/v/webpeel.svg" alt="npm version"></a>
|
|
9
|
+
<a href="https://pypi.org/project/webpeel/"><img src="https://img.shields.io/pypi/v/webpeel.svg" alt="PyPI version"></a>
|
|
10
|
+
<a href="https://www.npmjs.com/package/webpeel"><img src="https://img.shields.io/npm/dm/webpeel.svg" alt="npm downloads"></a>
|
|
11
|
+
<a href="https://github.com/webpeel/webpeel/stargazers"><img src="https://img.shields.io/github/stars/webpeel/webpeel.svg" alt="GitHub stars"></a>
|
|
12
|
+
<a href="https://github.com/webpeel/webpeel/actions/workflows/ci.yml"><img src="https://github.com/webpeel/webpeel/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
|
|
13
|
+
<a href="https://www.typescriptlang.org/"><img src="https://img.shields.io/badge/TypeScript-5.6-blue.svg" alt="TypeScript"></a>
|
|
14
|
+
<a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-blue.svg" alt="MIT License"></a>
|
|
15
|
+
</p>
|
|
16
|
+
|
|
17
|
+
<p align="center">
|
|
18
|
+
<b>Turn any web page into AI-ready markdown. Smart escalation. Stealth mode. Free to start.</b>
|
|
19
|
+
</p>
|
|
20
|
+
|
|
21
|
+
<p align="center">
|
|
22
|
+
<a href="https://webpeel.dev">Website</a> ·
|
|
23
|
+
<a href="https://webpeel.dev/docs">Docs</a> ·
|
|
24
|
+
<a href="https://webpeel.dev/playground">Playground</a> ·
|
|
25
|
+
<a href="https://app.webpeel.dev">Dashboard</a> ·
|
|
26
|
+
<a href="https://github.com/webpeel/webpeel/discussions">Discussions</a>
|
|
27
|
+
</p>
|
|
2
28
|
|
|
3
|
-
|
|
4
|
-
[](https://www.npmjs.com/package/webpeel)
|
|
5
|
-
[](https://github.com/JakeLiuMe/webpeel/stargazers)
|
|
6
|
-
[](https://github.com/JakeLiuMe/webpeel/actions/workflows/ci.yml)
|
|
7
|
-
[](https://www.typescriptlang.org/)
|
|
8
|
-
[](https://opensource.org/licenses/MIT)
|
|
29
|
+
---
|
|
9
30
|
|
|
10
|
-
|
|
31
|
+
## Quick Start
|
|
11
32
|
|
|
12
33
|
```bash
|
|
34
|
+
# Zero install — just run it
|
|
13
35
|
npx webpeel https://news.ycombinator.com
|
|
14
36
|
```
|
|
15
37
|
|
|
16
|
-
**Output:**
|
|
17
|
-
```markdown
|
|
18
|
-
# Hacker News
|
|
19
|
-
|
|
20
|
-
**New** | **Past** | **Comments** | **Ask** | **Show** | **Jobs** | **Submit**
|
|
21
|
-
|
|
22
|
-
## Top Stories
|
|
23
|
-
|
|
24
|
-
1. **Show HN: WebPeel – Turn any webpage into AI-ready markdown**
|
|
25
|
-
[https://github.com/JakeLiuMe/webpeel](https://github.com/JakeLiuMe/webpeel)
|
|
26
|
-
142 points by jakeliu 2 hours ago | 31 comments
|
|
27
|
-
|
|
28
|
-
2. **The End of the API Era**
|
|
29
|
-
...
|
|
30
|
-
```
|
|
31
|
-
|
|
32
|
-
---
|
|
33
|
-
|
|
34
|
-
## Why WebPeel?
|
|
35
|
-
|
|
36
|
-
| | **WebPeel** | Firecrawl | Jina Reader | MCP Fetch |
|
|
37
|
-
|---|:---:|:---:|:---:|:---:|
|
|
38
|
-
| **Free tier** | ✅ 125/week | 500 one-time | ❌ Cloud only | ✅ Unlimited |
|
|
39
|
-
| **Smart escalation** | ✅ HTTP→Browser→Stealth | Manual mode | ❌ No | ❌ No |
|
|
40
|
-
| **Stealth mode** | ✅ All plans | ✅ Yes | ⚠️ Limited | ❌ No |
|
|
41
|
-
| **Crawl + Map** | ✅ All plans | ✅ Yes | ❌ No | ❌ No |
|
|
42
|
-
| **AI Extraction** | ✅ BYOK (any LLM) | ✅ Built-in | ❌ No | ❌ No |
|
|
43
|
-
| **Branding** | ✅ Design system | ✅ Yes | ❌ No | ❌ No |
|
|
44
|
-
| **Change Tracking** | ✅ Local snapshots | ✅ Server-side | ❌ No | ❌ No |
|
|
45
|
-
| **Python SDK** | ✅ Zero deps | ✅ httpx/pydantic | ❌ No | ❌ No |
|
|
46
|
-
| **LangChain** | ✅ Official | ✅ Official | ❌ No | ❌ No |
|
|
47
|
-
| **MCP Server** | ✅ Built-in (6 tools) | ✅ Separate repo | ❌ No | ✅ Yes |
|
|
48
|
-
| **Token Budget** | ✅ `--max-tokens` | ❌ No | ❌ No | ❌ No |
|
|
49
|
-
| **Zero config** | ✅ `npx webpeel` | ❌ API key required | ❌ API key required | ✅ Yes |
|
|
50
|
-
| **Pricing** | $0 local / $9-$29 | $16-$333/mo | $10/mo+ | Free |
|
|
51
|
-
| **License** | MIT | AGPL-3.0 | Proprietary | MIT |
|
|
52
|
-
|
|
53
|
-
**WebPeel gives you Firecrawl's power with a generous free tier and MIT license.**
|
|
54
|
-
|
|
55
|
-
### Usage Model
|
|
56
|
-
|
|
57
|
-
WebPeel uses a **weekly usage budget** for all users (CLI and API):
|
|
58
|
-
|
|
59
|
-
- **First 25 fetches**: No account needed — try it instantly
|
|
60
|
-
- **Free tier**: 125 fetches/week (resets every Monday)
|
|
61
|
-
- **Pro tier**: 1,250 fetches/week ($9/mo)
|
|
62
|
-
- **Max tier**: 6,250 fetches/week ($29/mo)
|
|
63
|
-
|
|
64
|
-
**Credit costs**: Basic fetch = 1 credit, Stealth mode = 5 credits, Search = 1 credit, Crawl = 1 credit/page
|
|
65
|
-
|
|
66
|
-
**Open source**: The CLI is MIT licensed — you can self-host if needed. But the hosted API requires authentication after 25 fetches.
|
|
67
|
-
|
|
68
|
-
### Highlights
|
|
69
|
-
|
|
70
|
-
1. **🎭 Stealth Mode** — Bypass bot detection with playwright-extra stealth plugin. Works on sites that block regular scrapers.
|
|
71
|
-
2. **🕷️ Crawl Mode** — Follow links and extract entire sites. Respects robots.txt and rate limits automatically.
|
|
72
|
-
3. **💰 Generous Free Tier** — 125 free fetches every week. First 25 work instantly with no signup. Basic fetch + JS rendering included free.
|
|
73
|
-
|
|
74
|
-
---
|
|
75
|
-
|
|
76
|
-
## Quick Start
|
|
77
|
-
|
|
78
|
-
### CLI (Zero Install)
|
|
79
|
-
|
|
80
38
|
```bash
|
|
81
|
-
# First 25 fetches work instantly, no signup
|
|
82
|
-
npx webpeel https://example.com
|
|
83
|
-
|
|
84
|
-
# After 25 fetches, sign up for free (125/week)
|
|
85
|
-
webpeel login
|
|
86
|
-
|
|
87
|
-
# Check your usage
|
|
88
|
-
webpeel usage
|
|
89
|
-
|
|
90
39
|
# Stealth mode (bypass bot detection)
|
|
91
40
|
npx webpeel https://protected-site.com --stealth
|
|
92
41
|
|
|
93
|
-
#
|
|
94
|
-
npx webpeel https://example.com --
|
|
95
|
-
|
|
96
|
-
# Structured data extraction with CSS selectors (v0.4.0)
|
|
97
|
-
npx webpeel https://example.com --extract '{"title": "h1", "price": ".price", "description": ".desc"}'
|
|
98
|
-
|
|
99
|
-
# Token budget: truncate output to max tokens (v0.4.0)
|
|
100
|
-
npx webpeel https://example.com --max-tokens 2000
|
|
101
|
-
|
|
102
|
-
# Map discovery: find all URLs on a domain via sitemap & crawling (v0.4.0)
|
|
103
|
-
npx webpeel map https://example.com --max-urls 5000
|
|
104
|
-
|
|
105
|
-
# Extract branding/design system from a page (v0.5.0)
|
|
106
|
-
npx webpeel brand https://example.com
|
|
42
|
+
# Crawl a website
|
|
43
|
+
npx webpeel crawl https://example.com --max-pages 20
|
|
107
44
|
|
|
108
|
-
#
|
|
109
|
-
npx webpeel
|
|
45
|
+
# Search the web
|
|
46
|
+
npx webpeel search "best AI frameworks 2026"
|
|
110
47
|
|
|
111
|
-
#
|
|
112
|
-
npx webpeel
|
|
113
|
-
|
|
114
|
-
# Sitemap-first crawl with content deduplication (v0.4.0)
|
|
115
|
-
npx webpeel crawl https://example.com --sitemap-first --max-pages 100
|
|
116
|
-
|
|
117
|
-
# JSON output with metadata
|
|
118
|
-
npx webpeel https://example.com --json
|
|
48
|
+
# Autonomous agent (BYOK LLM)
|
|
49
|
+
npx webpeel agent "Find the founders of Stripe" --llm-key sk-...
|
|
50
|
+
```
|
|
119
51
|
|
|
120
|
-
|
|
121
|
-
npx webpeel https://example.com --cache 5m
|
|
52
|
+
First 25 fetches work instantly, no signup. After that, [sign up free](https://app.webpeel.dev/signup) for 125/week.
|
|
122
53
|
|
|
123
|
-
|
|
124
|
-
npx webpeel https://example.com --links
|
|
54
|
+
## Why WebPeel?
|
|
125
55
|
|
|
126
|
-
|
|
127
|
-
|
|
56
|
+
| Feature | **WebPeel** | Firecrawl | Jina Reader | MCP Fetch |
|
|
57
|
+
|---------|:-----------:|:---------:|:-----------:|:---------:|
|
|
58
|
+
| **Free tier** | ✅ 125/wk recurring | 500 one-time | ❌ Cloud only | ✅ Unlimited |
|
|
59
|
+
| **Smart escalation** | ✅ HTTP→Browser→Stealth | Manual | ❌ | ❌ |
|
|
60
|
+
| **Stealth mode** | ✅ All plans | ✅ | ⚠️ Limited | ❌ |
|
|
61
|
+
| **Firecrawl-compatible** | ✅ Drop-in replacement | ✅ Native | ❌ | ❌ |
|
|
62
|
+
| **Self-hosting** | ✅ Docker compose | ⚠️ Complex | ❌ | N/A |
|
|
63
|
+
| **Autonomous agent** | ✅ BYOK any LLM | ⚠️ Locked | ❌ | ❌ |
|
|
64
|
+
| **MCP tools** | ✅ 9 tools | 3 | 0 | 1 |
|
|
65
|
+
| **License** | ✅ MIT | AGPL-3.0 | Proprietary | MIT |
|
|
66
|
+
| **Pricing** | **Free / $9 / $29** | $0 / $16 / $83 | Custom | Free |
|
|
128
67
|
|
|
129
|
-
|
|
130
|
-
cat urls.txt | npx webpeel batch
|
|
68
|
+
## Install
|
|
131
69
|
|
|
132
|
-
|
|
133
|
-
|
|
70
|
+
```bash
|
|
71
|
+
# Node.js
|
|
72
|
+
npm install webpeel # or: pnpm add webpeel
|
|
134
73
|
|
|
135
|
-
#
|
|
136
|
-
|
|
74
|
+
# Python
|
|
75
|
+
pip install webpeel
|
|
137
76
|
|
|
138
|
-
#
|
|
139
|
-
webpeel
|
|
77
|
+
# Global CLI
|
|
78
|
+
npm install -g webpeel
|
|
140
79
|
```
|
|
141
80
|
|
|
142
|
-
|
|
81
|
+
## Usage
|
|
143
82
|
|
|
144
|
-
|
|
145
|
-
npm install webpeel
|
|
146
|
-
```
|
|
83
|
+
### Node.js
|
|
147
84
|
|
|
148
85
|
```typescript
|
|
149
86
|
import { peel } from 'webpeel';
|
|
150
87
|
|
|
151
|
-
// Simple usage
|
|
152
88
|
const result = await peel('https://example.com');
|
|
153
89
|
console.log(result.content); // Clean markdown
|
|
154
90
|
console.log(result.metadata); // { title, description, author, ... }
|
|
155
91
|
console.log(result.tokens); // Estimated token count
|
|
156
92
|
|
|
157
|
-
//
|
|
158
|
-
const
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
//
|
|
162
|
-
|
|
163
|
-
console.log(tracked.changeTracking); // { changeStatus: 'new' | 'same' | 'changed', diff: ... }
|
|
164
|
-
|
|
165
|
-
// AI extraction with your own LLM key (v0.5.0)
|
|
166
|
-
const extracted = await peel('https://example.com', {
|
|
167
|
-
extract: { prompt: 'Extract the pricing plans', llmApiKey: 'sk-...' },
|
|
93
|
+
// With options
|
|
94
|
+
const advanced = await peel('https://example.com', {
|
|
95
|
+
render: true, // Browser for JS-heavy sites
|
|
96
|
+
stealth: true, // Anti-bot stealth mode
|
|
97
|
+
maxTokens: 4000, // Limit output
|
|
98
|
+
includeTags: ['main'], // Filter HTML tags
|
|
168
99
|
});
|
|
169
|
-
console.log(extracted.extracted);
|
|
170
100
|
```
|
|
171
101
|
|
|
172
|
-
### Python
|
|
173
|
-
|
|
174
|
-
```bash
|
|
175
|
-
pip install webpeel
|
|
176
|
-
```
|
|
102
|
+
### Python
|
|
177
103
|
|
|
178
104
|
```python
|
|
179
105
|
from webpeel import WebPeel
|
|
180
106
|
|
|
181
|
-
client = WebPeel() # Free tier, no
|
|
107
|
+
client = WebPeel() # Free tier, no key needed
|
|
182
108
|
|
|
183
|
-
# Scrape
|
|
184
109
|
result = client.scrape("https://example.com")
|
|
185
110
|
print(result.content) # Clean markdown
|
|
186
111
|
|
|
187
|
-
# Search
|
|
188
112
|
results = client.search("python web scraping")
|
|
189
|
-
|
|
190
|
-
# Crawl (async job)
|
|
191
113
|
job = client.crawl("https://docs.example.com", limit=100)
|
|
192
|
-
status = client.get_job(job.id)
|
|
193
|
-
```
|
|
194
|
-
|
|
195
|
-
Zero dependencies. Pure Python 3.8+ stdlib. [Full docs →](python-sdk/README.md)
|
|
196
|
-
|
|
197
|
-
### MCP Server (Claude Desktop, Cursor, VS Code, Windsurf)
|
|
198
|
-
|
|
199
|
-
WebPeel provides six MCP tools: `webpeel_fetch`, `webpeel_search`, `webpeel_crawl`, `webpeel_map`, `webpeel_extract`, and `webpeel_batch`.
|
|
200
|
-
|
|
201
|
-
#### Claude Desktop
|
|
202
|
-
|
|
203
|
-
Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
|
|
204
|
-
|
|
205
|
-
```json
|
|
206
|
-
{
|
|
207
|
-
"mcpServers": {
|
|
208
|
-
"webpeel": {
|
|
209
|
-
"command": "npx",
|
|
210
|
-
"args": ["-y", "webpeel", "mcp"]
|
|
211
|
-
}
|
|
212
|
-
}
|
|
213
|
-
}
|
|
214
114
|
```
|
|
215
115
|
|
|
216
|
-
|
|
116
|
+
Zero dependencies. Pure Python 3.8+. [Full SDK docs →](python-sdk/README.md)
|
|
217
117
|
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
```json
|
|
221
|
-
{
|
|
222
|
-
"mcpServers": {
|
|
223
|
-
"webpeel": {
|
|
224
|
-
"command": "npx",
|
|
225
|
-
"args": ["-y", "webpeel", "mcp"]
|
|
226
|
-
}
|
|
227
|
-
}
|
|
228
|
-
}
|
|
229
|
-
```
|
|
118
|
+
### MCP Server
|
|
230
119
|
|
|
231
|
-
|
|
120
|
+
9 tools for Claude Desktop, Cursor, VS Code, and Windsurf:
|
|
232
121
|
|
|
233
|
-
|
|
122
|
+
`webpeel_fetch` · `webpeel_search` · `webpeel_crawl` · `webpeel_map` · `webpeel_extract` · `webpeel_batch` · `webpeel_agent` · `webpeel_summarize` · `webpeel_brand`
|
|
234
123
|
|
|
235
124
|
```json
|
|
236
125
|
{
|
|
@@ -243,378 +132,129 @@ Create or edit `~/.vscode/mcp.json`:
|
|
|
243
132
|
}
|
|
244
133
|
```
|
|
245
134
|
|
|
246
|
-
Or install with one click:
|
|
247
|
-
|
|
248
135
|
[](https://mcp.so/install/webpeel?for=claude)
|
|
249
136
|
[](https://mcp.so/install/webpeel?for=vscode)
|
|
250
137
|
|
|
251
|
-
|
|
252
|
-
|
|
253
|
-
Add to `~/.codeium/windsurf/mcp_config.json`:
|
|
138
|
+
> **Where to add this config:** Claude Desktop → `~/Library/Application Support/Claude/claude_desktop_config.json` · Cursor → Settings → MCP Servers · VS Code → `~/.vscode/mcp.json` · Windsurf → `~/.codeium/windsurf/mcp_config.json`
|
|
254
139
|
|
|
255
|
-
|
|
256
|
-
{
|
|
257
|
-
"mcpServers": {
|
|
258
|
-
"webpeel": {
|
|
259
|
-
"command": "npx",
|
|
260
|
-
"args": ["-y", "webpeel", "mcp"]
|
|
261
|
-
}
|
|
262
|
-
}
|
|
263
|
-
}
|
|
264
|
-
```
|
|
265
|
-
|
|
266
|
-
---
|
|
267
|
-
|
|
268
|
-
## Use with Claude Code
|
|
269
|
-
|
|
270
|
-
One command to add WebPeel to Claude Code:
|
|
140
|
+
### Docker (Self-Hosted)
|
|
271
141
|
|
|
272
142
|
```bash
|
|
273
|
-
|
|
274
|
-
|
|
275
|
-
|
|
276
|
-
Or add to your project's `.mcp.json` for team sharing:
|
|
277
|
-
|
|
278
|
-
```json
|
|
279
|
-
{
|
|
280
|
-
"mcpServers": {
|
|
281
|
-
"webpeel": {
|
|
282
|
-
"command": "npx",
|
|
283
|
-
"args": ["-y", "webpeel", "mcp"]
|
|
284
|
-
}
|
|
285
|
-
}
|
|
286
|
-
}
|
|
287
|
-
```
|
|
288
|
-
|
|
289
|
-
This gives Claude Code access to:
|
|
290
|
-
- **webpeel_fetch** — Fetch any URL as clean markdown (with stealth mode, actions, extraction & token budget)
|
|
291
|
-
- **webpeel_search** — Search the web via DuckDuckGo
|
|
292
|
-
- **webpeel_batch** — Fetch multiple URLs concurrently
|
|
293
|
-
- **webpeel_crawl** — Crawl websites following links (with sitemap-first & deduplication)
|
|
294
|
-
- **webpeel_map** — Discover all URLs on a domain via sitemap.xml & link crawling
|
|
295
|
-
- **webpeel_extract** — Extract structured data using CSS selectors or JSON schema
|
|
296
|
-
|
|
297
|
-
---
|
|
298
|
-
|
|
299
|
-
## How It Works: Smart Escalation
|
|
300
|
-
|
|
301
|
-
WebPeel tries the fastest method first, then escalates only when needed:
|
|
302
|
-
|
|
143
|
+
git clone https://github.com/webpeel/webpeel.git
|
|
144
|
+
cd webpeel && docker compose up
|
|
303
145
|
```
|
|
304
|
-
┌─────────────────────────────────────────────────────────────┐
|
|
305
|
-
│ Smart Escalation │
|
|
306
|
-
└─────────────────────────────────────────────────────────────┘
|
|
307
|
-
|
|
308
|
-
Simple HTTP Fetch → Browser Rendering → Stealth Mode
|
|
309
|
-
~200ms ~2 seconds ~5 seconds
|
|
310
|
-
│ │ │
|
|
311
|
-
├─ User-Agent headers ├─ Full JS execution ├─ Anti-detect
|
|
312
|
-
├─ Cheerio parsing ├─ Wait for content ├─ Fingerprint mask
|
|
313
|
-
├─ Fast & cheap ├─ Screenshots ├─ Cloudflare bypass
|
|
314
|
-
│ │ │
|
|
315
|
-
▼ ▼ ▼
|
|
316
|
-
Works for 80% Works for 15% Works for 5%
|
|
317
|
-
of websites (JS-heavy sites) (bot-protected)
|
|
318
|
-
```
|
|
319
|
-
|
|
320
|
-
**Why this matters:**
|
|
321
|
-
- **Speed**: Don't waste 2 seconds rendering when 200ms will do
|
|
322
|
-
- **Cost**: Headless browsers burn CPU and memory
|
|
323
|
-
- **Reliability**: Auto-retry with browser if simple fetch fails
|
|
324
|
-
|
|
325
|
-
WebPeel automatically detects blocked requests (403, 503, Cloudflare challenges) and retries with browser mode. You get the best of both worlds.
|
|
326
146
|
|
|
327
|
-
|
|
147
|
+
Full API at `http://localhost:3000`. MIT licensed — no restrictions.
|
|
328
148
|
|
|
329
|
-
##
|
|
149
|
+
## Features
|
|
330
150
|
|
|
331
|
-
###
|
|
151
|
+
### 🎯 Smart Escalation
|
|
332
152
|
|
|
333
|
-
|
|
153
|
+
Automatically uses the fastest method, escalates only when needed:
|
|
334
154
|
|
|
335
|
-
```typescript
|
|
336
|
-
interface PeelOptions {
|
|
337
|
-
render?: boolean; // Force browser mode (default: false)
|
|
338
|
-
wait?: number; // Wait time after page load in ms (default: 0)
|
|
339
|
-
format?: 'markdown' | 'text' | 'html'; // Output format (default: 'markdown')
|
|
340
|
-
timeout?: number; // Request timeout in ms (default: 30000)
|
|
341
|
-
userAgent?: string; // Custom user agent
|
|
342
|
-
}
|
|
343
|
-
|
|
344
|
-
interface PeelResult {
|
|
345
|
-
url: string; // Final URL (after redirects)
|
|
346
|
-
title: string; // Page title
|
|
347
|
-
content: string; // Page content in requested format
|
|
348
|
-
metadata: { // Extracted metadata
|
|
349
|
-
description?: string;
|
|
350
|
-
author?: string;
|
|
351
|
-
published?: string; // ISO 8601 date
|
|
352
|
-
image?: string; // Open Graph image
|
|
353
|
-
canonical?: string;
|
|
354
|
-
};
|
|
355
|
-
links: string[]; // All links on page (absolute URLs)
|
|
356
|
-
tokens: number; // Estimated token count
|
|
357
|
-
method: 'simple' | 'browser'; // Method used
|
|
358
|
-
elapsed: number; // Time taken (ms)
|
|
359
|
-
}
|
|
360
155
|
```
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
|
|
364
|
-
```typescript
|
|
365
|
-
import { TimeoutError, BlockedError, NetworkError } from 'webpeel';
|
|
366
|
-
|
|
367
|
-
try {
|
|
368
|
-
const result = await peel('https://example.com');
|
|
369
|
-
} catch (error) {
|
|
370
|
-
if (error instanceof TimeoutError) {
|
|
371
|
-
// Request timed out
|
|
372
|
-
} else if (error instanceof BlockedError) {
|
|
373
|
-
// Site blocked the request (403, Cloudflare, etc.)
|
|
374
|
-
} else if (error instanceof NetworkError) {
|
|
375
|
-
// Network/DNS error
|
|
376
|
-
}
|
|
377
|
-
}
|
|
156
|
+
HTTP Fetch (200ms) → Browser Rendering (2s) → Stealth Mode (5s)
|
|
157
|
+
80% of sites 15% of sites 5% of sites
|
|
378
158
|
```
|
|
379
159
|
|
|
380
|
-
###
|
|
381
|
-
|
|
382
|
-
Clean up browser resources. Call this when you're done using WebPeel in your application:
|
|
160
|
+
### 🎭 Stealth Mode
|
|
383
161
|
|
|
384
|
-
|
|
385
|
-
import { peel, cleanup } from 'webpeel';
|
|
386
|
-
|
|
387
|
-
// ... use peel() ...
|
|
162
|
+
Bypass Cloudflare and bot detection. Masks browser fingerprints, navigator properties, WebGL vendor.
|
|
388
163
|
|
|
389
|
-
|
|
164
|
+
```bash
|
|
165
|
+
npx webpeel https://protected-site.com --stealth
|
|
390
166
|
```
|
|
391
167
|
|
|
392
|
-
|
|
168
|
+
### 🕷️ Crawl & Map
|
|
393
169
|
|
|
394
|
-
|
|
395
|
-
|
|
396
|
-
Live at `https://api.webpeel.dev` — authentication required after first 25 fetches.
|
|
170
|
+
Crawl websites with link following, sitemap discovery, robots.txt compliance, and deduplication.
|
|
397
171
|
|
|
398
172
|
```bash
|
|
399
|
-
|
|
400
|
-
|
|
401
|
-
-H "Content-Type: application/json" \
|
|
402
|
-
-d '{"email":"you@example.com","password":"your-password"}'
|
|
403
|
-
|
|
404
|
-
# Fetch a page
|
|
405
|
-
curl "https://api.webpeel.dev/v1/fetch?url=https://example.com" \
|
|
406
|
-
-H "Authorization: Bearer wp_live_your_api_key"
|
|
173
|
+
npx webpeel crawl https://docs.example.com --max-pages 100
|
|
174
|
+
npx webpeel map https://example.com --max-urls 5000
|
|
407
175
|
```
|
|
408
176
|
|
|
409
|
-
###
|
|
410
|
-
|
|
411
|
-
Usage resets every **Monday at 00:00 UTC**, just like Claude Code.
|
|
412
|
-
|
|
413
|
-
| Plan | Price | Weekly Fetches | Burst Limit | All Features | Extra Usage |
|
|
414
|
-
|------|------:|---------------:|:-----------:|:------------:|:-----------:|
|
|
415
|
-
| **Free** | $0 | 125/wk (~500/mo) | 25/hr | ✅ | ❌ |
|
|
416
|
-
| **Pro** | $9/mo | 1,250/wk (~5K/mo) | 100/hr | ✅ | ✅ |
|
|
417
|
-
| **Max** | $29/mo | 6,250/wk (~25K/mo) | 500/hr | ✅ | ✅ |
|
|
418
|
-
|
|
419
|
-
**Three layers of usage control:**
|
|
420
|
-
1. **Burst limit** — Per-hour cap (25/hr free, 100/hr Pro, 500/hr Max) prevents hammering
|
|
421
|
-
2. **Weekly limit** — Main usage gate, resets every Monday
|
|
422
|
-
3. **Extra usage** — When you hit your weekly limit, keep fetching at pay-as-you-go rates
|
|
423
|
-
|
|
424
|
-
**Extra usage rates (Pro/Max only):**
|
|
425
|
-
| Fetch Type | Cost |
|
|
426
|
-
|-----------|------|
|
|
427
|
-
| Basic (HTTP) | $0.002 |
|
|
428
|
-
| Stealth (browser) | $0.01 |
|
|
429
|
-
| Search | $0.001 |
|
|
430
|
-
|
|
431
|
-
### Why WebPeel Beats Firecrawl
|
|
432
|
-
|
|
433
|
-
| Feature | WebPeel Free | WebPeel Pro | Firecrawl Hobby |
|
|
434
|
-
|---------|:-------------:|:-----------:|:---------------:|
|
|
435
|
-
| **Price** | $0 | $9/mo | $16/mo |
|
|
436
|
-
| **Weekly Fetches** | 125/wk | 1,250/wk | ~750/wk |
|
|
437
|
-
| **Rollover** | ❌ | ✅ 1 week | ❌ Expire monthly |
|
|
438
|
-
| **Soft Limits** | ✅ Degrades | ✅ Never locked out | ❌ Hard cut-off |
|
|
439
|
-
| **Extra Usage** | ❌ | ✅ Pay-as-you-go | ❌ Upgrade only |
|
|
440
|
-
| **Self-Host** | ✅ MIT | ✅ MIT | ❌ AGPL |
|
|
441
|
-
|
|
442
|
-
**Key differentiators:**
|
|
443
|
-
- **Like Claude Code** — Generous free tier (125/week), pay when you need more
|
|
444
|
-
- **Weekly resets** — Your usage refreshes every Monday, not once a month
|
|
445
|
-
- **Soft limits on every tier** — At 100%, we degrade gracefully instead of blocking you
|
|
446
|
-
- **Extra usage** — Pro/Max users can toggle on pay-as-you-go with spending caps (no surprise bills)
|
|
447
|
-
- **First 25 free** — Try it instantly, no signup required
|
|
448
|
-
- **Open source** — MIT licensed, self-host if you want full control
|
|
449
|
-
|
|
450
|
-
See pricing at [webpeel.dev](https://webpeel.dev/#pricing)
|
|
451
|
-
|
|
452
|
-
---
|
|
453
|
-
|
|
454
|
-
## Examples
|
|
177
|
+
### 🤖 Autonomous Agent (BYOK)
|
|
455
178
|
|
|
456
|
-
|
|
179
|
+
Give it a prompt, it researches the web using your own LLM key.
|
|
457
180
|
|
|
458
|
-
```
|
|
459
|
-
|
|
460
|
-
|
|
461
|
-
console.log(result.metadata);
|
|
462
|
-
// {
|
|
463
|
-
// title: "How We Built WebPeel",
|
|
464
|
-
// description: "A deep dive into smart escalation...",
|
|
465
|
-
// author: "Jake Liu",
|
|
466
|
-
// published: "2026-02-12T18:00:00Z",
|
|
467
|
-
// image: "https://example.com/og-image.png"
|
|
468
|
-
// }
|
|
181
|
+
```bash
|
|
182
|
+
npx webpeel agent "Compare pricing of Notion vs Coda" --llm-key sk-...
|
|
469
183
|
```
|
|
470
184
|
|
|
471
|
-
###
|
|
185
|
+
### 📊 More Features
|
|
472
186
|
|
|
473
|
-
|
|
474
|
-
|
|
475
|
-
|
|
476
|
-
|
|
477
|
-
|
|
478
|
-
|
|
479
|
-
|
|
480
|
-
|
|
481
|
-
|
|
482
|
-
|
|
483
|
-
|
|
187
|
+
| Feature | CLI | Node.js | Python | API |
|
|
188
|
+
|---------|:---:|:-------:|:------:|:---:|
|
|
189
|
+
| Structured extraction | ✅ | ✅ | ✅ | ✅ |
|
|
190
|
+
| Screenshots | ✅ | ✅ | — | ✅ |
|
|
191
|
+
| Branding extraction | ✅ | ✅ | — | — |
|
|
192
|
+
| Change tracking | ✅ | ✅ | — | — |
|
|
193
|
+
| Token budget | ✅ | ✅ | ✅ | ✅ |
|
|
194
|
+
| Tag filtering | ✅ | ✅ | ✅ | ✅ |
|
|
195
|
+
| Image extraction | ✅ | ✅ | — | ✅ |
|
|
196
|
+
| AI summarization | ✅ | ✅ | — | ✅ |
|
|
197
|
+
| Batch processing | — | ✅ | — | ✅ |
|
|
198
|
+
| PDF extraction | ✅ | ✅ | — | — |
|
|
484
199
|
|
|
485
|
-
|
|
486
|
-
|
|
487
|
-
```typescript
|
|
488
|
-
// Twitter/X requires JavaScript
|
|
489
|
-
const result = await peel('https://x.com/elonmusk', {
|
|
490
|
-
render: true,
|
|
491
|
-
wait: 2000, // Wait for tweets to load
|
|
492
|
-
});
|
|
200
|
+
## Integrations
|
|
493
201
|
|
|
494
|
-
|
|
495
|
-
```
|
|
202
|
+
Works with **LangChain**, **LlamaIndex**, **CrewAI**, **Dify**, and **n8n**. [Integration docs →](https://webpeel.dev/docs)
|
|
496
203
|
|
|
497
|
-
|
|
204
|
+
## Hosted API
|
|
498
205
|
|
|
499
|
-
|
|
500
|
-
const result = await peel('https://example.com/long-article');
|
|
206
|
+
Live at [`api.webpeel.dev`](https://api.webpeel.dev) — Firecrawl-compatible endpoints.
|
|
501
207
|
|
|
502
|
-
|
|
503
|
-
|
|
208
|
+
```bash
|
|
209
|
+
# Fetch a page (free, no auth needed for first 25)
|
|
210
|
+
curl "https://api.webpeel.dev/v1/fetch?url=https://example.com"
|
|
504
211
|
|
|
505
|
-
|
|
506
|
-
|
|
507
|
-
|
|
212
|
+
# With API key
|
|
213
|
+
curl "https://api.webpeel.dev/v1/fetch?url=https://example.com" \
|
|
214
|
+
-H "Authorization: Bearer wp_..."
|
|
508
215
|
```
|
|
509
216
|
|
|
510
|
-
|
|
217
|
+
### Pricing
|
|
511
218
|
|
|
512
|
-
|
|
219
|
+
| Plan | Price | Weekly Fetches | Burst | Extra Usage |
|
|
220
|
+
|------|------:|---------------:|:-----:|:-----------:|
|
|
221
|
+
| **Free** | $0 | 125/wk | 25/hr | — |
|
|
222
|
+
| **Pro** | $9/mo | 1,250/wk | 100/hr | ✅ from $0.001 |
|
|
223
|
+
| **Max** | $29/mo | 6,250/wk | 500/hr | ✅ from $0.001 |
|
|
513
224
|
|
|
514
|
-
|
|
515
|
-
- **Research**: Bulk extract articles, docs, or social media
|
|
516
|
-
- **Monitoring**: Track content changes on websites
|
|
517
|
-
- **Archiving**: Save web pages as clean markdown
|
|
518
|
-
- **Data Pipelines**: Extract structured data from web sources
|
|
519
|
-
|
|
520
|
-
---
|
|
225
|
+
Extra credit costs: fetch $0.002, search $0.001, stealth $0.01. Resets every Monday. All features on all plans. [Compare with Firecrawl →](https://webpeel.dev/migrate-from-firecrawl)
|
|
521
226
|
|
|
522
227
|
## Development
|
|
523
228
|
|
|
524
229
|
```bash
|
|
525
|
-
|
|
526
|
-
git clone https://github.com/JakeLiuMe/webpeel.git
|
|
230
|
+
git clone https://github.com/webpeel/webpeel.git
|
|
527
231
|
cd webpeel
|
|
528
|
-
|
|
529
|
-
# Install dependencies
|
|
530
|
-
npm install
|
|
531
|
-
|
|
532
|
-
# Build
|
|
533
|
-
npm run build
|
|
534
|
-
|
|
535
|
-
# Run tests
|
|
232
|
+
npm install && npm run build
|
|
536
233
|
npm test
|
|
537
|
-
|
|
538
|
-
# Watch mode (auto-rebuild)
|
|
539
|
-
npm run dev
|
|
540
|
-
|
|
541
|
-
# Test the CLI locally
|
|
542
|
-
node dist/cli.js https://example.com
|
|
543
|
-
|
|
544
|
-
# Test the MCP server
|
|
545
|
-
npm run mcp
|
|
546
234
|
```
|
|
547
235
|
|
|
548
236
|
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
|
|
549
237
|
|
|
550
|
-
|
|
551
|
-
|
|
552
|
-
## Roadmap
|
|
553
|
-
|
|
554
|
-
- [x] CLI with smart escalation
|
|
555
|
-
- [x] TypeScript library
|
|
556
|
-
- [x] MCP server for Claude/Cursor/VS Code
|
|
557
|
-
- [x] Hosted API with authentication and usage tracking
|
|
558
|
-
- [x] Rate limiting and caching
|
|
559
|
-
- [x] Batch processing API (`batch <file>`)
|
|
560
|
-
- [x] Screenshot capture (`--screenshot`)
|
|
561
|
-
- [x] CSS selector filtering (`--selector`, `--exclude`)
|
|
562
|
-
- [x] DuckDuckGo search (`search <query>`)
|
|
563
|
-
- [x] Custom headers and cookies
|
|
564
|
-
- [x] Weekly reset usage model with extra usage
|
|
565
|
-
- [x] Stealth mode (playwright-extra + anti-detect)
|
|
566
|
-
- [x] Crawl mode (follow links, respect robots.txt)
|
|
567
|
-
- [x] PDF extraction (v0.4.0)
|
|
568
|
-
- [x] Structured data extraction with CSS selectors and JSON schema (v0.4.0)
|
|
569
|
-
- [x] Page actions: click, scroll, type, fill, select, press, hover (v0.4.0)
|
|
570
|
-
- [x] Map/sitemap discovery for full site URL mapping (v0.4.0)
|
|
571
|
-
- [x] Token budget for output truncation (v0.4.0)
|
|
572
|
-
- [x] Advanced crawl: sitemap-first, BFS/DFS, content deduplication (v0.4.0)
|
|
573
|
-
- [ ] Webhook notifications for monitoring
|
|
574
|
-
|
|
575
|
-
Vote on features and roadmap at [GitHub Discussions](https://github.com/JakeLiuMe/webpeel/discussions).
|
|
576
|
-
|
|
577
|
-
---
|
|
578
|
-
|
|
579
|
-
## FAQ
|
|
580
|
-
|
|
581
|
-
**Q: How is this different from Firecrawl?**
|
|
582
|
-
A: WebPeel has a more generous free tier (125/week vs Firecrawl's 500 one-time credits) and uses weekly resets like Claude Code. We also have smart escalation to avoid burning resources on simple pages.
|
|
238
|
+
## Links
|
|
583
239
|
|
|
584
|
-
|
|
585
|
-
A: Yes! Run `npm run serve` to start the API server. See [docs/self-hosting.md](docs/self-hosting.md) (coming soon).
|
|
240
|
+
[Documentation](https://webpeel.dev/docs) · [Playground](https://webpeel.dev/playground) · [API Reference](https://webpeel.dev/docs/api-reference) · [npm](https://www.npmjs.com/package/webpeel) · [PyPI](https://pypi.org/project/webpeel/) · [Migration Guide](https://webpeel.dev/migrate-from-firecrawl) · [Blog](https://webpeel.dev/blog) · [Discussions](https://github.com/webpeel/webpeel/discussions)
|
|
586
241
|
|
|
587
|
-
|
|
588
|
-
A: WebPeel is a tool — how you use it is up to you. Always check a site's ToS before fetching at scale. We recommend respecting `robots.txt` in your own workflows.
|
|
242
|
+
## Star History
|
|
589
243
|
|
|
590
|
-
|
|
591
|
-
|
|
592
|
-
|
|
593
|
-
|
|
594
|
-
|
|
595
|
-
|
|
596
|
-
|
|
597
|
-
|
|
598
|
-
## Credits
|
|
599
|
-
|
|
600
|
-
Built with:
|
|
601
|
-
- [Playwright](https://playwright.dev/) — Headless browser automation
|
|
602
|
-
- [Cheerio](https://cheerio.js.org/) — Fast HTML parsing
|
|
603
|
-
- [Turndown](https://github.com/mixmark-io/turndown) — HTML to Markdown conversion
|
|
604
|
-
- [Commander](https://github.com/tj/commander.js) — CLI framework
|
|
605
|
-
|
|
606
|
-
---
|
|
607
|
-
|
|
608
|
-
## Contributing
|
|
609
|
-
|
|
610
|
-
Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
|
|
611
|
-
|
|
612
|
-
---
|
|
244
|
+
<a href="https://star-history.com/#webpeel/webpeel&Date">
|
|
245
|
+
<picture>
|
|
246
|
+
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=webpeel/webpeel&type=Date&theme=dark" />
|
|
247
|
+
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=webpeel/webpeel&type=Date" />
|
|
248
|
+
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=webpeel/webpeel&type=Date" width="600" />
|
|
249
|
+
</picture>
|
|
250
|
+
</a>
|
|
613
251
|
|
|
614
252
|
## License
|
|
615
253
|
|
|
616
|
-
MIT © [
|
|
254
|
+
MIT © [WebPeel](https://github.com/webpeel)
|
|
617
255
|
|
|
618
256
|
---
|
|
619
257
|
|
|
620
|
-
|
|
258
|
+
<p align="center">
|
|
259
|
+
<b>Like WebPeel?</b> <a href="https://github.com/webpeel/webpeel">⭐ Star us on GitHub</a> — it helps others discover the project!
|
|
260
|
+
</p>
|