webpeel 0.6.1 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (72) hide show
  1. package/README.md +140 -500
  2. package/dist/cli-auth.d.ts +2 -0
  3. package/dist/cli-auth.d.ts.map +1 -1
  4. package/dist/cli-auth.js +16 -3
  5. package/dist/cli-auth.js.map +1 -1
  6. package/dist/cli.js +475 -77
  7. package/dist/cli.js.map +1 -1
  8. package/dist/core/actions.d.ts +19 -10
  9. package/dist/core/actions.d.ts.map +1 -1
  10. package/dist/core/actions.js +214 -43
  11. package/dist/core/actions.js.map +1 -1
  12. package/dist/core/agent.d.ts +60 -3
  13. package/dist/core/agent.d.ts.map +1 -1
  14. package/dist/core/agent.js +375 -86
  15. package/dist/core/agent.js.map +1 -1
  16. package/dist/core/answer.d.ts +43 -0
  17. package/dist/core/answer.d.ts.map +1 -0
  18. package/dist/core/answer.js +378 -0
  19. package/dist/core/answer.js.map +1 -0
  20. package/dist/core/cache.d.ts +14 -0
  21. package/dist/core/cache.d.ts.map +1 -0
  22. package/dist/core/cache.js +122 -0
  23. package/dist/core/cache.js.map +1 -0
  24. package/dist/core/dns-cache.d.ts +21 -0
  25. package/dist/core/dns-cache.d.ts.map +1 -0
  26. package/dist/core/dns-cache.js +184 -0
  27. package/dist/core/dns-cache.js.map +1 -0
  28. package/dist/core/documents.d.ts +24 -0
  29. package/dist/core/documents.d.ts.map +1 -0
  30. package/dist/core/documents.js +124 -0
  31. package/dist/core/documents.js.map +1 -0
  32. package/dist/core/extract-inline.d.ts +39 -0
  33. package/dist/core/extract-inline.d.ts.map +1 -0
  34. package/dist/core/extract-inline.js +214 -0
  35. package/dist/core/extract-inline.js.map +1 -0
  36. package/dist/core/fetcher.d.ts +33 -7
  37. package/dist/core/fetcher.d.ts.map +1 -1
  38. package/dist/core/fetcher.js +608 -41
  39. package/dist/core/fetcher.js.map +1 -1
  40. package/dist/core/jobs.d.ts +66 -0
  41. package/dist/core/jobs.d.ts.map +1 -0
  42. package/dist/core/jobs.js +513 -0
  43. package/dist/core/jobs.js.map +1 -0
  44. package/dist/core/markdown.d.ts.map +1 -1
  45. package/dist/core/markdown.js +141 -31
  46. package/dist/core/markdown.js.map +1 -1
  47. package/dist/core/pdf.d.ts.map +1 -1
  48. package/dist/core/pdf.js +3 -1
  49. package/dist/core/pdf.js.map +1 -1
  50. package/dist/core/screenshot.d.ts +33 -0
  51. package/dist/core/screenshot.d.ts.map +1 -0
  52. package/dist/core/screenshot.js +30 -0
  53. package/dist/core/screenshot.js.map +1 -0
  54. package/dist/core/search-provider.d.ts +46 -0
  55. package/dist/core/search-provider.d.ts.map +1 -0
  56. package/dist/core/search-provider.js +281 -0
  57. package/dist/core/search-provider.js.map +1 -0
  58. package/dist/core/strategies.d.ts +7 -10
  59. package/dist/core/strategies.d.ts.map +1 -1
  60. package/dist/core/strategies.js +370 -63
  61. package/dist/core/strategies.js.map +1 -1
  62. package/dist/index.d.ts +9 -3
  63. package/dist/index.d.ts.map +1 -1
  64. package/dist/index.js +61 -32
  65. package/dist/index.js.map +1 -1
  66. package/dist/mcp/server.js +335 -70
  67. package/dist/mcp/server.js.map +1 -1
  68. package/dist/types.d.ts +43 -1
  69. package/dist/types.d.ts.map +1 -1
  70. package/dist/types.js.map +1 -1
  71. package/llms.txt +85 -47
  72. package/package.json +11 -5
package/README.md CHANGED
@@ -1,236 +1,125 @@
1
- # WebPeel
1
+ <p align="center">
2
+ <a href="https://webpeel.dev">
3
+ <img src=".github/banner.svg" alt="WebPeel — Web fetching for AI agents" width="100%">
4
+ </a>
5
+ </p>
6
+
7
+ <p align="center">
8
+ <a href="https://www.npmjs.com/package/webpeel"><img src="https://img.shields.io/npm/v/webpeel.svg" alt="npm version"></a>
9
+ <a href="https://pypi.org/project/webpeel/"><img src="https://img.shields.io/pypi/v/webpeel.svg" alt="PyPI version"></a>
10
+ <a href="https://www.npmjs.com/package/webpeel"><img src="https://img.shields.io/npm/dm/webpeel.svg" alt="npm downloads"></a>
11
+ <a href="https://github.com/webpeel/webpeel/stargazers"><img src="https://img.shields.io/github/stars/webpeel/webpeel.svg" alt="GitHub stars"></a>
12
+ <a href="https://github.com/webpeel/webpeel/actions/workflows/ci.yml"><img src="https://github.com/webpeel/webpeel/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
13
+ <a href="https://www.typescriptlang.org/"><img src="https://img.shields.io/badge/TypeScript-5.6-blue.svg" alt="TypeScript"></a>
14
+ <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-blue.svg" alt="MIT License"></a>
15
+ </p>
16
+
17
+ <p align="center">
18
+ <b>Turn any web page into AI-ready markdown. Smart escalation. Stealth mode. Free to start.</b>
19
+ </p>
20
+
21
+ <p align="center">
22
+ <a href="https://webpeel.dev">Website</a> ·
23
+ <a href="https://webpeel.dev/docs">Docs</a> ·
24
+ <a href="https://webpeel.dev/playground">Playground</a> ·
25
+ <a href="https://app.webpeel.dev">Dashboard</a> ·
26
+ <a href="https://github.com/webpeel/webpeel/discussions">Discussions</a>
27
+ </p>
2
28
 
3
- [![npm version](https://img.shields.io/npm/v/webpeel.svg)](https://www.npmjs.com/package/webpeel)
4
- [![npm downloads](https://img.shields.io/npm/dm/webpeel.svg)](https://www.npmjs.com/package/webpeel)
5
- [![GitHub stars](https://img.shields.io/github/stars/JakeLiuMe/webpeel.svg)](https://github.com/JakeLiuMe/webpeel/stargazers)
6
- [![CI](https://github.com/JakeLiuMe/webpeel/actions/workflows/ci.yml/badge.svg)](https://github.com/JakeLiuMe/webpeel/actions/workflows/ci.yml)
7
- [![TypeScript](https://img.shields.io/badge/TypeScript-5.6-blue.svg)](https://www.typescriptlang.org/)
8
- [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
29
+ ---
9
30
 
10
- Turn any web page into clean markdown. **Smart escalation. Stealth mode. Crawl mode. Free to start.**
31
+ ## Quick Start
11
32
 
12
33
  ```bash
34
+ # Zero install — just run it
13
35
  npx webpeel https://news.ycombinator.com
14
36
  ```
15
37
 
16
- **Output:**
17
- ```markdown
18
- # Hacker News
19
-
20
- **New** | **Past** | **Comments** | **Ask** | **Show** | **Jobs** | **Submit**
21
-
22
- ## Top Stories
23
-
24
- 1. **Show HN: WebPeel – Turn any webpage into AI-ready markdown**
25
- [https://github.com/JakeLiuMe/webpeel](https://github.com/JakeLiuMe/webpeel)
26
- 142 points by jakeliu 2 hours ago | 31 comments
27
-
28
- 2. **The End of the API Era**
29
- ...
30
- ```
31
-
32
- ---
33
-
34
- ## Why WebPeel?
35
-
36
- | | **WebPeel** | Firecrawl | Jina Reader | MCP Fetch |
37
- |---|:---:|:---:|:---:|:---:|
38
- | **Free tier** | ✅ 125/week | 500 one-time | ❌ Cloud only | ✅ Unlimited |
39
- | **Smart escalation** | ✅ HTTP→Browser→Stealth | Manual mode | ❌ No | ❌ No |
40
- | **Stealth mode** | ✅ All plans | ✅ Yes | ⚠️ Limited | ❌ No |
41
- | **Crawl + Map** | ✅ All plans | ✅ Yes | ❌ No | ❌ No |
42
- | **AI Extraction** | ✅ BYOK (any LLM) | ✅ Built-in | ❌ No | ❌ No |
43
- | **Branding** | ✅ Design system | ✅ Yes | ❌ No | ❌ No |
44
- | **Change Tracking** | ✅ Local snapshots | ✅ Server-side | ❌ No | ❌ No |
45
- | **Python SDK** | ✅ Zero deps | ✅ httpx/pydantic | ❌ No | ❌ No |
46
- | **LangChain** | ✅ Official | ✅ Official | ❌ No | ❌ No |
47
- | **MCP Server** | ✅ Built-in (6 tools) | ✅ Separate repo | ❌ No | ✅ Yes |
48
- | **Token Budget** | ✅ `--max-tokens` | ❌ No | ❌ No | ❌ No |
49
- | **Zero config** | ✅ `npx webpeel` | ❌ API key required | ❌ API key required | ✅ Yes |
50
- | **Pricing** | $0 local / $9-$29 | $16-$333/mo | $10/mo+ | Free |
51
- | **License** | MIT | AGPL-3.0 | Proprietary | MIT |
52
-
53
- **WebPeel gives you Firecrawl's power with a generous free tier and MIT license.**
54
-
55
- ### Usage Model
56
-
57
- WebPeel uses a **weekly usage budget** for all users (CLI and API):
58
-
59
- - **First 25 fetches**: No account needed — try it instantly
60
- - **Free tier**: 125 fetches/week (resets every Monday)
61
- - **Pro tier**: 1,250 fetches/week ($9/mo)
62
- - **Max tier**: 6,250 fetches/week ($29/mo)
63
-
64
- **Credit costs**: Basic fetch = 1 credit, Stealth mode = 5 credits, Search = 1 credit, Crawl = 1 credit/page
65
-
66
- **Open source**: The CLI is MIT licensed — you can self-host if needed. But the hosted API requires authentication after 25 fetches.
67
-
68
- ### Highlights
69
-
70
- 1. **🎭 Stealth Mode** — Bypass bot detection with playwright-extra stealth plugin. Works on sites that block regular scrapers.
71
- 2. **🕷️ Crawl Mode** — Follow links and extract entire sites. Respects robots.txt and rate limits automatically.
72
- 3. **💰 Generous Free Tier** — 125 free fetches every week. First 25 work instantly with no signup. Basic fetch + JS rendering included free.
73
-
74
- ---
75
-
76
- ## Quick Start
77
-
78
- ### CLI (Zero Install)
79
-
80
38
  ```bash
81
- # First 25 fetches work instantly, no signup
82
- npx webpeel https://example.com
83
-
84
- # After 25 fetches, sign up for free (125/week)
85
- webpeel login
86
-
87
- # Check your usage
88
- webpeel usage
89
-
90
39
  # Stealth mode (bypass bot detection)
91
40
  npx webpeel https://protected-site.com --stealth
92
41
 
93
- # Page actions: click, scroll, type before extraction (v0.4.0)
94
- npx webpeel https://example.com --action "click:.cookie-accept" --action "wait:2000" --action "scroll:bottom"
95
-
96
- # Structured data extraction with CSS selectors (v0.4.0)
97
- npx webpeel https://example.com --extract '{"title": "h1", "price": ".price", "description": ".desc"}'
98
-
99
- # Token budget: truncate output to max tokens (v0.4.0)
100
- npx webpeel https://example.com --max-tokens 2000
101
-
102
- # Map discovery: find all URLs on a domain via sitemap & crawling (v0.4.0)
103
- npx webpeel map https://example.com --max-urls 5000
104
-
105
- # Extract branding/design system from a page (v0.5.0)
106
- npx webpeel brand https://example.com
42
+ # Crawl a website
43
+ npx webpeel crawl https://example.com --max-pages 20
107
44
 
108
- # Track content changes over time (v0.5.0)
109
- npx webpeel track https://example.com
45
+ # Search the web
46
+ npx webpeel search "best AI frameworks 2026"
110
47
 
111
- # Crawl a website (follow links, respect robots.txt)
112
- npx webpeel crawl https://example.com --max-pages 20 --max-depth 2
113
-
114
- # Sitemap-first crawl with content deduplication (v0.4.0)
115
- npx webpeel crawl https://example.com --sitemap-first --max-pages 100
116
-
117
- # JSON output with metadata
118
- npx webpeel https://example.com --json
48
+ # Autonomous agent (BYOK LLM)
49
+ npx webpeel agent "Find the founders of Stripe" --llm-key sk-...
50
+ ```
119
51
 
120
- # Cache results locally (avoid repeat fetches)
121
- npx webpeel https://example.com --cache 5m
52
+ First 25 fetches work instantly, no signup. After that, [sign up free](https://app.webpeel.dev/signup) for 125/week.
122
53
 
123
- # Extract just the links from a page
124
- npx webpeel https://example.com --links
54
+ ## Why WebPeel?
125
55
 
126
- # Extract just the metadata (title, description, author)
127
- npx webpeel https://example.com --meta
56
+ | Feature | **WebPeel** | Firecrawl | Jina Reader | MCP Fetch |
57
+ |---------|:-----------:|:---------:|:-----------:|:---------:|
58
+ | **Free tier** | ✅ 125/wk recurring | 500 one-time | ❌ Cloud only | ✅ Unlimited |
59
+ | **Smart escalation** | ✅ HTTP→Browser→Stealth | Manual | ❌ | ❌ |
60
+ | **Stealth mode** | ✅ All plans | ✅ | ⚠️ Limited | ❌ |
61
+ | **Firecrawl-compatible** | ✅ Drop-in replacement | ✅ Native | ❌ | ❌ |
62
+ | **Self-hosting** | ✅ Docker compose | ⚠️ Complex | ❌ | N/A |
63
+ | **Autonomous agent** | ✅ BYOK any LLM | ⚠️ Locked | ❌ | ❌ |
64
+ | **MCP tools** | ✅ 9 tools | 3 | 0 | 1 |
65
+ | **License** | ✅ MIT | AGPL-3.0 | Proprietary | MIT |
66
+ | **Pricing** | **Free / $9 / $29** | $0 / $16 / $83 | Custom | Free |
128
67
 
129
- # Batch fetch from file or stdin
130
- cat urls.txt | npx webpeel batch
68
+ ## Install
131
69
 
132
- # Force browser rendering (for JS-heavy sites)
133
- npx webpeel https://x.com/elonmusk --render
70
+ ```bash
71
+ # Node.js
72
+ npm install webpeel # or: pnpm add webpeel
134
73
 
135
- # Wait for dynamic content
136
- npx webpeel https://example.com --render --wait 3000
74
+ # Python
75
+ pip install webpeel
137
76
 
138
- # View your config and cache stats
139
- webpeel config
77
+ # Global CLI
78
+ npm install -g webpeel
140
79
  ```
141
80
 
142
- ### Library (TypeScript)
81
+ ## Usage
143
82
 
144
- ```bash
145
- npm install webpeel
146
- ```
83
+ ### Node.js
147
84
 
148
85
  ```typescript
149
86
  import { peel } from 'webpeel';
150
87
 
151
- // Simple usage
152
88
  const result = await peel('https://example.com');
153
89
  console.log(result.content); // Clean markdown
154
90
  console.log(result.metadata); // { title, description, author, ... }
155
91
  console.log(result.tokens); // Estimated token count
156
92
 
157
- // Branding extraction (v0.5.0)
158
- const brand = await peel('https://stripe.com', { branding: true, render: true });
159
- console.log(brand.branding); // { colors, fonts, typography, cssVariables, ... }
160
-
161
- // Change tracking (v0.5.0)
162
- const tracked = await peel('https://example.com/pricing', { changeTracking: true });
163
- console.log(tracked.changeTracking); // { changeStatus: 'new' | 'same' | 'changed', diff: ... }
164
-
165
- // AI extraction with your own LLM key (v0.5.0)
166
- const extracted = await peel('https://example.com', {
167
- extract: { prompt: 'Extract the pricing plans', llmApiKey: 'sk-...' },
93
+ // With options
94
+ const advanced = await peel('https://example.com', {
95
+ render: true, // Browser for JS-heavy sites
96
+ stealth: true, // Anti-bot stealth mode
97
+ maxTokens: 4000, // Limit output
98
+ includeTags: ['main'], // Filter HTML tags
168
99
  });
169
- console.log(extracted.extracted);
170
100
  ```
171
101
 
172
- ### Python SDK (v0.5.0)
173
-
174
- ```bash
175
- pip install webpeel
176
- ```
102
+ ### Python
177
103
 
178
104
  ```python
179
105
  from webpeel import WebPeel
180
106
 
181
- client = WebPeel() # Free tier, no API key needed
107
+ client = WebPeel() # Free tier, no key needed
182
108
 
183
- # Scrape
184
109
  result = client.scrape("https://example.com")
185
110
  print(result.content) # Clean markdown
186
111
 
187
- # Search
188
112
  results = client.search("python web scraping")
189
-
190
- # Crawl (async job)
191
113
  job = client.crawl("https://docs.example.com", limit=100)
192
- status = client.get_job(job.id)
193
- ```
194
-
195
- Zero dependencies. Pure Python 3.8+ stdlib. [Full docs →](python-sdk/README.md)
196
-
197
- ### MCP Server (Claude Desktop, Cursor, VS Code, Windsurf)
198
-
199
- WebPeel provides six MCP tools: `webpeel_fetch`, `webpeel_search`, `webpeel_crawl`, `webpeel_map`, `webpeel_extract`, and `webpeel_batch`.
200
-
201
- #### Claude Desktop
202
-
203
- Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
204
-
205
- ```json
206
- {
207
- "mcpServers": {
208
- "webpeel": {
209
- "command": "npx",
210
- "args": ["-y", "webpeel", "mcp"]
211
- }
212
- }
213
- }
214
114
  ```
215
115
 
216
- #### Cursor
116
+ Zero dependencies. Pure Python 3.8+. [Full SDK docs →](python-sdk/README.md)
217
117
 
218
- Add to Cursor Settings → MCP Servers:
219
-
220
- ```json
221
- {
222
- "mcpServers": {
223
- "webpeel": {
224
- "command": "npx",
225
- "args": ["-y", "webpeel", "mcp"]
226
- }
227
- }
228
- }
229
- ```
118
+ ### MCP Server
230
119
 
231
- #### VS Code (with Cline or other MCP clients)
120
+ 9 tools for Claude Desktop, Cursor, VS Code, and Windsurf:
232
121
 
233
- Create or edit `~/.vscode/mcp.json`:
122
+ `webpeel_fetch` · `webpeel_search` · `webpeel_crawl` · `webpeel_map` · `webpeel_extract` · `webpeel_batch` · `webpeel_agent` · `webpeel_summarize` · `webpeel_brand`
234
123
 
235
124
  ```json
236
125
  {
@@ -243,378 +132,129 @@ Create or edit `~/.vscode/mcp.json`:
243
132
  }
244
133
  ```
245
134
 
246
- Or install with one click:
247
-
248
135
  [![Install in Claude Desktop](https://img.shields.io/badge/Install-Claude%20Desktop-5B3FFF?style=for-the-badge&logo=anthropic)](https://mcp.so/install/webpeel?for=claude)
249
136
  [![Install in VS Code](https://img.shields.io/badge/Install-VS%20Code-007ACC?style=for-the-badge&logo=visualstudiocode)](https://mcp.so/install/webpeel?for=vscode)
250
137
 
251
- #### Windsurf
252
-
253
- Add to `~/.codeium/windsurf/mcp_config.json`:
138
+ > **Where to add this config:** Claude Desktop → `~/Library/Application Support/Claude/claude_desktop_config.json` · Cursor → Settings → MCP Servers · VS Code → `~/.vscode/mcp.json` · Windsurf → `~/.codeium/windsurf/mcp_config.json`
254
139
 
255
- ```json
256
- {
257
- "mcpServers": {
258
- "webpeel": {
259
- "command": "npx",
260
- "args": ["-y", "webpeel", "mcp"]
261
- }
262
- }
263
- }
264
- ```
265
-
266
- ---
267
-
268
- ## Use with Claude Code
269
-
270
- One command to add WebPeel to Claude Code:
140
+ ### Docker (Self-Hosted)
271
141
 
272
142
  ```bash
273
- claude mcp add webpeel -- npx -y webpeel mcp
274
- ```
275
-
276
- Or add to your project's `.mcp.json` for team sharing:
277
-
278
- ```json
279
- {
280
- "mcpServers": {
281
- "webpeel": {
282
- "command": "npx",
283
- "args": ["-y", "webpeel", "mcp"]
284
- }
285
- }
286
- }
287
- ```
288
-
289
- This gives Claude Code access to:
290
- - **webpeel_fetch** — Fetch any URL as clean markdown (with stealth mode, actions, extraction & token budget)
291
- - **webpeel_search** — Search the web via DuckDuckGo
292
- - **webpeel_batch** — Fetch multiple URLs concurrently
293
- - **webpeel_crawl** — Crawl websites following links (with sitemap-first & deduplication)
294
- - **webpeel_map** — Discover all URLs on a domain via sitemap.xml & link crawling
295
- - **webpeel_extract** — Extract structured data using CSS selectors or JSON schema
296
-
297
- ---
298
-
299
- ## How It Works: Smart Escalation
300
-
301
- WebPeel tries the fastest method first, then escalates only when needed:
302
-
143
+ git clone https://github.com/webpeel/webpeel.git
144
+ cd webpeel && docker compose up
303
145
  ```
304
- ┌─────────────────────────────────────────────────────────────┐
305
- │ Smart Escalation │
306
- └─────────────────────────────────────────────────────────────┘
307
-
308
- Simple HTTP Fetch → Browser Rendering → Stealth Mode
309
- ~200ms ~2 seconds ~5 seconds
310
- │ │ │
311
- ├─ User-Agent headers ├─ Full JS execution ├─ Anti-detect
312
- ├─ Cheerio parsing ├─ Wait for content ├─ Fingerprint mask
313
- ├─ Fast & cheap ├─ Screenshots ├─ Cloudflare bypass
314
- │ │ │
315
- ▼ ▼ ▼
316
- Works for 80% Works for 15% Works for 5%
317
- of websites (JS-heavy sites) (bot-protected)
318
- ```
319
-
320
- **Why this matters:**
321
- - **Speed**: Don't waste 2 seconds rendering when 200ms will do
322
- - **Cost**: Headless browsers burn CPU and memory
323
- - **Reliability**: Auto-retry with browser if simple fetch fails
324
-
325
- WebPeel automatically detects blocked requests (403, 503, Cloudflare challenges) and retries with browser mode. You get the best of both worlds.
326
146
 
327
- ---
147
+ Full API at `http://localhost:3000`. MIT licensed — no restrictions.
328
148
 
329
- ## API Reference
149
+ ## Features
330
150
 
331
- ### `peel(url, options?)`
151
+ ### 🎯 Smart Escalation
332
152
 
333
- Fetch and extract content from a URL.
153
+ Automatically uses the fastest method, escalates only when needed:
334
154
 
335
- ```typescript
336
- interface PeelOptions {
337
- render?: boolean; // Force browser mode (default: false)
338
- wait?: number; // Wait time after page load in ms (default: 0)
339
- format?: 'markdown' | 'text' | 'html'; // Output format (default: 'markdown')
340
- timeout?: number; // Request timeout in ms (default: 30000)
341
- userAgent?: string; // Custom user agent
342
- }
343
-
344
- interface PeelResult {
345
- url: string; // Final URL (after redirects)
346
- title: string; // Page title
347
- content: string; // Page content in requested format
348
- metadata: { // Extracted metadata
349
- description?: string;
350
- author?: string;
351
- published?: string; // ISO 8601 date
352
- image?: string; // Open Graph image
353
- canonical?: string;
354
- };
355
- links: string[]; // All links on page (absolute URLs)
356
- tokens: number; // Estimated token count
357
- method: 'simple' | 'browser'; // Method used
358
- elapsed: number; // Time taken (ms)
359
- }
360
155
  ```
361
-
362
- ### Error Types
363
-
364
- ```typescript
365
- import { TimeoutError, BlockedError, NetworkError } from 'webpeel';
366
-
367
- try {
368
- const result = await peel('https://example.com');
369
- } catch (error) {
370
- if (error instanceof TimeoutError) {
371
- // Request timed out
372
- } else if (error instanceof BlockedError) {
373
- // Site blocked the request (403, Cloudflare, etc.)
374
- } else if (error instanceof NetworkError) {
375
- // Network/DNS error
376
- }
377
- }
156
+ HTTP Fetch (200ms) → Browser Rendering (2s) → Stealth Mode (5s)
157
+ 80% of sites 15% of sites 5% of sites
378
158
  ```
379
159
 
380
- ### `cleanup()`
381
-
382
- Clean up browser resources. Call this when you're done using WebPeel in your application:
160
+ ### 🎭 Stealth Mode
383
161
 
384
- ```typescript
385
- import { peel, cleanup } from 'webpeel';
386
-
387
- // ... use peel() ...
162
+ Bypass Cloudflare and bot detection. Masks browser fingerprints, navigator properties, WebGL vendor.
388
163
 
389
- await cleanup(); // Close browser instances
164
+ ```bash
165
+ npx webpeel https://protected-site.com --stealth
390
166
  ```
391
167
 
392
- ---
168
+ ### 🕷️ Crawl & Map
393
169
 
394
- ## Hosted API
395
-
396
- Live at `https://api.webpeel.dev` — authentication required after first 25 fetches.
170
+ Crawl websites with link following, sitemap discovery, robots.txt compliance, and deduplication.
397
171
 
398
172
  ```bash
399
- # Register and get your API key
400
- curl -X POST https://api.webpeel.dev/v1/auth/register \
401
- -H "Content-Type: application/json" \
402
- -d '{"email":"you@example.com","password":"your-password"}'
403
-
404
- # Fetch a page
405
- curl "https://api.webpeel.dev/v1/fetch?url=https://example.com" \
406
- -H "Authorization: Bearer wp_live_your_api_key"
173
+ npx webpeel crawl https://docs.example.com --max-pages 100
174
+ npx webpeel map https://example.com --max-urls 5000
407
175
  ```
408
176
 
409
- ### Pricing Weekly Reset Model
410
-
411
- Usage resets every **Monday at 00:00 UTC**, just like Claude Code.
412
-
413
- | Plan | Price | Weekly Fetches | Burst Limit | All Features | Extra Usage |
414
- |------|------:|---------------:|:-----------:|:------------:|:-----------:|
415
- | **Free** | $0 | 125/wk (~500/mo) | 25/hr | ✅ | ❌ |
416
- | **Pro** | $9/mo | 1,250/wk (~5K/mo) | 100/hr | ✅ | ✅ |
417
- | **Max** | $29/mo | 6,250/wk (~25K/mo) | 500/hr | ✅ | ✅ |
418
-
419
- **Three layers of usage control:**
420
- 1. **Burst limit** — Per-hour cap (25/hr free, 100/hr Pro, 500/hr Max) prevents hammering
421
- 2. **Weekly limit** — Main usage gate, resets every Monday
422
- 3. **Extra usage** — When you hit your weekly limit, keep fetching at pay-as-you-go rates
423
-
424
- **Extra usage rates (Pro/Max only):**
425
- | Fetch Type | Cost |
426
- |-----------|------|
427
- | Basic (HTTP) | $0.002 |
428
- | Stealth (browser) | $0.01 |
429
- | Search | $0.001 |
430
-
431
- ### Why WebPeel Beats Firecrawl
432
-
433
- | Feature | WebPeel Free | WebPeel Pro | Firecrawl Hobby |
434
- |---------|:-------------:|:-----------:|:---------------:|
435
- | **Price** | $0 | $9/mo | $16/mo |
436
- | **Weekly Fetches** | 125/wk | 1,250/wk | ~750/wk |
437
- | **Rollover** | ❌ | ✅ 1 week | ❌ Expire monthly |
438
- | **Soft Limits** | ✅ Degrades | ✅ Never locked out | ❌ Hard cut-off |
439
- | **Extra Usage** | ❌ | ✅ Pay-as-you-go | ❌ Upgrade only |
440
- | **Self-Host** | ✅ MIT | ✅ MIT | ❌ AGPL |
441
-
442
- **Key differentiators:**
443
- - **Like Claude Code** — Generous free tier (125/week), pay when you need more
444
- - **Weekly resets** — Your usage refreshes every Monday, not once a month
445
- - **Soft limits on every tier** — At 100%, we degrade gracefully instead of blocking you
446
- - **Extra usage** — Pro/Max users can toggle on pay-as-you-go with spending caps (no surprise bills)
447
- - **First 25 free** — Try it instantly, no signup required
448
- - **Open source** — MIT licensed, self-host if you want full control
449
-
450
- See pricing at [webpeel.dev](https://webpeel.dev/#pricing)
451
-
452
- ---
453
-
454
- ## Examples
177
+ ### 🤖 Autonomous Agent (BYOK)
455
178
 
456
- ### Extract blog post metadata
179
+ Give it a prompt, it researches the web using your own LLM key.
457
180
 
458
- ```typescript
459
- const result = await peel('https://example.com/blog/post');
460
-
461
- console.log(result.metadata);
462
- // {
463
- // title: "How We Built WebPeel",
464
- // description: "A deep dive into smart escalation...",
465
- // author: "Jake Liu",
466
- // published: "2026-02-12T18:00:00Z",
467
- // image: "https://example.com/og-image.png"
468
- // }
181
+ ```bash
182
+ npx webpeel agent "Compare pricing of Notion vs Coda" --llm-key sk-...
469
183
  ```
470
184
 
471
- ### Get all links from a page
185
+ ### 📊 More Features
472
186
 
473
- ```typescript
474
- const result = await peel('https://news.ycombinator.com');
475
-
476
- console.log(result.links.slice(0, 5));
477
- // [
478
- // "https://news.ycombinator.com/newest",
479
- // "https://news.ycombinator.com/submit",
480
- // "https://github.com/example/repo",
481
- // ...
482
- // ]
483
- ```
187
+ | Feature | CLI | Node.js | Python | API |
188
+ |---------|:---:|:-------:|:------:|:---:|
189
+ | Structured extraction | ✅ | ✅ | ✅ | ✅ |
190
+ | Screenshots | ✅ | ✅ | — | ✅ |
191
+ | Branding extraction | ✅ | ✅ | — | — |
192
+ | Change tracking | ✅ | ✅ | — | — |
193
+ | Token budget | ✅ | ✅ | ✅ | ✅ |
194
+ | Tag filtering | ✅ | ✅ | ✅ | ✅ |
195
+ | Image extraction | ✅ | ✅ | — | ✅ |
196
+ | AI summarization | ✅ | ✅ | — | ✅ |
197
+ | Batch processing | — | ✅ | — | ✅ |
198
+ | PDF extraction | ✅ | ✅ | — | — |
484
199
 
485
- ### Force browser rendering for JavaScript-heavy sites
486
-
487
- ```typescript
488
- // Twitter/X requires JavaScript
489
- const result = await peel('https://x.com/elonmusk', {
490
- render: true,
491
- wait: 2000, // Wait for tweets to load
492
- });
200
+ ## Integrations
493
201
 
494
- console.log(result.content); // Rendered tweet content
495
- ```
202
+ Works with **LangChain**, **LlamaIndex**, **CrewAI**, **Dify**, and **n8n**. [Integration docs →](https://webpeel.dev/docs)
496
203
 
497
- ### Token counting for LLM usage
204
+ ## Hosted API
498
205
 
499
- ```typescript
500
- const result = await peel('https://example.com/long-article');
206
+ Live at [`api.webpeel.dev`](https://api.webpeel.dev) — Firecrawl-compatible endpoints.
501
207
 
502
- console.log(`Content is ~${result.tokens} tokens`);
503
- // Content is ~3,247 tokens
208
+ ```bash
209
+ # Fetch a page (free, no auth needed for first 25)
210
+ curl "https://api.webpeel.dev/v1/fetch?url=https://example.com"
504
211
 
505
- if (result.tokens > 4000) {
506
- console.log('Too long for GPT-3.5 context window');
507
- }
212
+ # With API key
213
+ curl "https://api.webpeel.dev/v1/fetch?url=https://example.com" \
214
+ -H "Authorization: Bearer wp_..."
508
215
  ```
509
216
 
510
- ---
217
+ ### Pricing
511
218
 
512
- ## Use Cases
219
+ | Plan | Price | Weekly Fetches | Burst | Extra Usage |
220
+ |------|------:|---------------:|:-----:|:-----------:|
221
+ | **Free** | $0 | 125/wk | 25/hr | — |
222
+ | **Pro** | $9/mo | 1,250/wk | 100/hr | ✅ from $0.001 |
223
+ | **Max** | $29/mo | 6,250/wk | 500/hr | ✅ from $0.001 |
513
224
 
514
- - **AI Agents**: Feed web content to Claude, GPT, or local LLMs
515
- - **Research**: Bulk extract articles, docs, or social media
516
- - **Monitoring**: Track content changes on websites
517
- - **Archiving**: Save web pages as clean markdown
518
- - **Data Pipelines**: Extract structured data from web sources
519
-
520
- ---
225
+ Extra credit costs: fetch $0.002, search $0.001, stealth $0.01. Resets every Monday. All features on all plans. [Compare with Firecrawl →](https://webpeel.dev/migrate-from-firecrawl)
521
226
 
522
227
  ## Development
523
228
 
524
229
  ```bash
525
- # Clone the repo
526
- git clone https://github.com/JakeLiuMe/webpeel.git
230
+ git clone https://github.com/webpeel/webpeel.git
527
231
  cd webpeel
528
-
529
- # Install dependencies
530
- npm install
531
-
532
- # Build
533
- npm run build
534
-
535
- # Run tests
232
+ npm install && npm run build
536
233
  npm test
537
-
538
- # Watch mode (auto-rebuild)
539
- npm run dev
540
-
541
- # Test the CLI locally
542
- node dist/cli.js https://example.com
543
-
544
- # Test the MCP server
545
- npm run mcp
546
234
  ```
547
235
 
548
236
  See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
549
237
 
550
- ---
551
-
552
- ## Roadmap
553
-
554
- - [x] CLI with smart escalation
555
- - [x] TypeScript library
556
- - [x] MCP server for Claude/Cursor/VS Code
557
- - [x] Hosted API with authentication and usage tracking
558
- - [x] Rate limiting and caching
559
- - [x] Batch processing API (`batch <file>`)
560
- - [x] Screenshot capture (`--screenshot`)
561
- - [x] CSS selector filtering (`--selector`, `--exclude`)
562
- - [x] DuckDuckGo search (`search <query>`)
563
- - [x] Custom headers and cookies
564
- - [x] Weekly reset usage model with extra usage
565
- - [x] Stealth mode (playwright-extra + anti-detect)
566
- - [x] Crawl mode (follow links, respect robots.txt)
567
- - [x] PDF extraction (v0.4.0)
568
- - [x] Structured data extraction with CSS selectors and JSON schema (v0.4.0)
569
- - [x] Page actions: click, scroll, type, fill, select, press, hover (v0.4.0)
570
- - [x] Map/sitemap discovery for full site URL mapping (v0.4.0)
571
- - [x] Token budget for output truncation (v0.4.0)
572
- - [x] Advanced crawl: sitemap-first, BFS/DFS, content deduplication (v0.4.0)
573
- - [ ] Webhook notifications for monitoring
574
-
575
- Vote on features and roadmap at [GitHub Discussions](https://github.com/JakeLiuMe/webpeel/discussions).
576
-
577
- ---
578
-
579
- ## FAQ
580
-
581
- **Q: How is this different from Firecrawl?**
582
- A: WebPeel has a more generous free tier (125/week vs Firecrawl's 500 one-time credits) and uses weekly resets like Claude Code. We also have smart escalation to avoid burning resources on simple pages.
238
+ ## Links
583
239
 
584
- **Q: Can I self-host the API server?**
585
- A: Yes! Run `npm run serve` to start the API server. See [docs/self-hosting.md](docs/self-hosting.md) (coming soon).
240
+ [Documentation](https://webpeel.dev/docs) · [Playground](https://webpeel.dev/playground) · [API Reference](https://webpeel.dev/docs/api-reference) · [npm](https://www.npmjs.com/package/webpeel) · [PyPI](https://pypi.org/project/webpeel/) · [Migration Guide](https://webpeel.dev/migrate-from-firecrawl) · [Blog](https://webpeel.dev/blog) · [Discussions](https://github.com/webpeel/webpeel/discussions)
586
241
 
587
- **Q: Does this violate websites' Terms of Service?**
588
- A: WebPeel is a tool — how you use it is up to you. Always check a site's ToS before fetching at scale. We recommend respecting `robots.txt` in your own workflows.
242
+ ## Star History
589
243
 
590
- **Q: What about Cloudflare and bot protection?**
591
- A: WebPeel handles most Cloudflare challenges automatically via stealth mode (available on all plans). For heavily protected sites, stealth mode uses browser fingerprint randomization to bypass detection.
592
-
593
- **Q: Can I use this in production?**
594
- A: Yes! The hosted API at `https://api.webpeel.dev` is production-ready with authentication, rate limiting, and usage tracking.
595
-
596
- ---
597
-
598
- ## Credits
599
-
600
- Built with:
601
- - [Playwright](https://playwright.dev/) — Headless browser automation
602
- - [Cheerio](https://cheerio.js.org/) — Fast HTML parsing
603
- - [Turndown](https://github.com/mixmark-io/turndown) — HTML to Markdown conversion
604
- - [Commander](https://github.com/tj/commander.js) — CLI framework
605
-
606
- ---
607
-
608
- ## Contributing
609
-
610
- Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
611
-
612
- ---
244
+ <a href="https://star-history.com/#webpeel/webpeel&Date">
245
+ <picture>
246
+ <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=webpeel/webpeel&type=Date&theme=dark" />
247
+ <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=webpeel/webpeel&type=Date" />
248
+ <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=webpeel/webpeel&type=Date" width="600" />
249
+ </picture>
250
+ </a>
613
251
 
614
252
  ## License
615
253
 
616
- MIT © [Jake Liu](https://github.com/JakeLiuMe)
254
+ MIT © [WebPeel](https://github.com/webpeel)
617
255
 
618
256
  ---
619
257
 
620
- **Like WebPeel?** [⭐ Star us on GitHub](https://github.com/JakeLiuMe/webpeel) — it helps others discover the project!
258
+ <p align="center">
259
+ <b>Like WebPeel?</b> <a href="https://github.com/webpeel/webpeel">⭐ Star us on GitHub</a> — it helps others discover the project!
260
+ </p>