webpeel 0.2.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,11 +2,12 @@
2
2
 
3
3
  [![npm version](https://img.shields.io/npm/v/webpeel.svg)](https://www.npmjs.com/package/webpeel)
4
4
  [![npm downloads](https://img.shields.io/npm/dm/webpeel.svg)](https://www.npmjs.com/package/webpeel)
5
+ [![GitHub stars](https://img.shields.io/github/stars/JakeLiuMe/webpeel.svg)](https://github.com/JakeLiuMe/webpeel/stargazers)
5
6
  [![CI](https://github.com/JakeLiuMe/webpeel/actions/workflows/ci.yml/badge.svg)](https://github.com/JakeLiuMe/webpeel/actions/workflows/ci.yml)
6
7
  [![TypeScript](https://img.shields.io/badge/TypeScript-5.6-blue.svg)](https://www.typescriptlang.org/)
7
8
  [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
8
9
 
9
- Turn any web page into clean markdown. Zero config. Free forever.
10
+ Turn any web page into clean markdown. **Stealth mode. Crawl mode. Zero config. Free forever.**
10
11
 
11
12
  ```bash
12
13
  npx webpeel https://news.ycombinator.com
@@ -34,18 +35,40 @@ npx webpeel https://news.ycombinator.com
34
35
 
35
36
  | | **WebPeel** | Firecrawl | Jina Reader | MCP Fetch |
36
37
  |---|:---:|:---:|:---:|:---:|
37
- | **Local execution** | ✅ Free forever | ❌ Cloud only | ❌ Cloud only | ✅ Free |
38
+ | **Free tier** | ✅ 125/week | ❌ Cloud only | ❌ Cloud only | ✅ Unlimited |
38
39
  | **JS rendering** | ✅ Auto-escalates | ✅ Always | ❌ No | ❌ No |
39
- | **Anti-bot handling** | ✅ Stealth mode | ✅ Yes | ⚠️ Limited | ❌ No |
40
+ | **Stealth mode** | ✅ Built-in | ✅ Yes | ⚠️ Limited | ❌ No |
41
+ | **Crawl mode** | ✅ Built-in | ✅ Yes | ❌ No | ❌ No |
40
42
  | **MCP Server** | ✅ Built-in | ✅ Separate repo | ❌ No | ✅ Yes |
41
43
  | **Zero config** | ✅ `npx webpeel` | ❌ API key required | ❌ API key required | ✅ Yes |
42
- | **Free tier** | Unlimited local | 500 pages (one-time) | 1000 req/month | ∞ Local only |
43
- | **Hosted API** | $9/mo (5K pages) | $16/mo (3K pages) | $200/mo (Starter) | N/A |
44
- | **Credit rollover** | Up to 1 month | ❌ Expire monthly | ❌ N/A | ❌ N/A |
44
+ | **Free tier** | 125/week | 500 pages (one-time) | 1000 req/month | ∞ Unlimited |
45
+ | **Hosted API** | $9/mo (1,250/wk) | $16/mo (3K/mo) | $200/mo (Starter) | N/A |
46
+ | **Weekly reset** | N/A | ❌ Monthly only | ❌ Monthly only | ❌ N/A |
47
+ | **Extra usage** | N/A | ✅ Pay-as-you-go | ❌ Upgrade only | N/A |
48
+ | **Rollover** | N/A | ✅ 1 week | ❌ Expire monthly | ❌ N/A |
45
49
  | **Soft limits** | ✅ Never blocked | ❌ Hard cut-off | ❌ Rate limited | ❌ N/A |
46
50
  | **Markdown output** | ✅ Optimized for AI | ✅ Yes | ✅ Yes | ⚠️ Basic |
47
51
 
48
- **WebPeel gives you Firecrawl's power without the price tag.** Run locally for free, or use our hosted API when you need scale.
52
+ **WebPeel gives you Firecrawl's power with a generous free tier.** Like Claude Code pay only when you need more.
53
+
54
+ ### Usage Model
55
+
56
+ WebPeel uses a **weekly usage budget** for all users (CLI and API):
57
+
58
+ - **First 25 fetches**: No account needed — try it instantly
59
+ - **Free tier**: 125 fetches/week (resets every Monday)
60
+ - **Pro tier**: 1,250 fetches/week ($9/mo)
61
+ - **Max tier**: 6,250 fetches/week ($29/mo)
62
+
63
+ **Credit costs**: Basic fetch = 1 credit, Stealth mode = 5 credits, Search = 1 credit, Crawl = 1 credit/page
64
+
65
+ **Open source**: The CLI is MIT licensed — you can self-host if needed. But the hosted API requires authentication after 25 fetches.
66
+
67
+ ### Highlights
68
+
69
+ 1. **🎭 Stealth Mode** — Bypass bot detection with playwright-extra stealth plugin. Works on sites that block regular scrapers.
70
+ 2. **🕷️ Crawl Mode** — Follow links and extract entire sites. Respects robots.txt and rate limits automatically.
71
+ 3. **💰 Generous Free Tier** — Like Claude Code: 125 free fetches every week. First 25 work instantly, no signup. Open source MIT.
49
72
 
50
73
  ---
51
74
 
@@ -54,9 +77,21 @@ npx webpeel https://news.ycombinator.com
54
77
  ### CLI (Zero Install)
55
78
 
56
79
  ```bash
57
- # Basic usage
80
+ # First 25 fetches work instantly, no signup
58
81
  npx webpeel https://example.com
59
82
 
83
+ # After 25 fetches, sign up for free (125/week)
84
+ webpeel login
85
+
86
+ # Check your usage
87
+ webpeel usage
88
+
89
+ # Stealth mode (bypass bot detection)
90
+ npx webpeel https://protected-site.com --stealth
91
+
92
+ # Crawl a website (follow links, respect robots.txt)
93
+ npx webpeel crawl https://example.com --max-pages 20 --max-depth 2
94
+
60
95
  # JSON output with metadata
61
96
  npx webpeel https://example.com --json
62
97
 
@@ -91,9 +126,9 @@ const result = await peel('https://example.com', {
91
126
  });
92
127
  ```
93
128
 
94
- ### MCP Server (Claude Desktop, Cursor, VS Code)
129
+ ### MCP Server (Claude Desktop, Cursor, VS Code, Windsurf)
95
130
 
96
- WebPeel provides two MCP tools: `webpeel_fetch` (fetch a URL) and `webpeel_search` (DuckDuckGo search + fetch results).
131
+ WebPeel provides four MCP tools: `webpeel_fetch` (fetch a URL), `webpeel_search` (search the web), `webpeel_batch` (fetch multiple URLs), and `webpeel_crawl` (crawl a site).
97
132
 
98
133
  #### Claude Desktop
99
134
 
@@ -145,6 +180,50 @@ Or install with one click:
145
180
  [![Install in Claude Desktop](https://img.shields.io/badge/Install-Claude%20Desktop-5B3FFF?style=for-the-badge&logo=anthropic)](https://mcp.so/install/webpeel?for=claude)
146
181
  [![Install in VS Code](https://img.shields.io/badge/Install-VS%20Code-007ACC?style=for-the-badge&logo=visualstudiocode)](https://mcp.so/install/webpeel?for=vscode)
147
182
 
183
+ #### Windsurf
184
+
185
+ Add to `~/.codeium/windsurf/mcp_config.json`:
186
+
187
+ ```json
188
+ {
189
+ "mcpServers": {
190
+ "webpeel": {
191
+ "command": "npx",
192
+ "args": ["-y", "webpeel", "mcp"]
193
+ }
194
+ }
195
+ }
196
+ ```
197
+
198
+ ---
199
+
200
+ ## Use with Claude Code
201
+
202
+ One command to add WebPeel to Claude Code:
203
+
204
+ ```bash
205
+ claude mcp add webpeel -- npx -y webpeel mcp
206
+ ```
207
+
208
+ Or add to your project's `.mcp.json` for team sharing:
209
+
210
+ ```json
211
+ {
212
+ "mcpServers": {
213
+ "webpeel": {
214
+ "command": "npx",
215
+ "args": ["-y", "webpeel", "mcp"]
216
+ }
217
+ }
218
+ }
219
+ ```
220
+
221
+ This gives Claude Code access to:
222
+ - **webpeel_fetch** — Fetch any URL as clean markdown (with stealth mode for protected sites)
223
+ - **webpeel_search** — Search the web via DuckDuckGo
224
+ - **webpeel_batch** — Fetch multiple URLs concurrently
225
+ - **webpeel_crawl** — Crawl websites following links
226
+
148
227
  ---
149
228
 
150
229
  ## How It Works: Smart Escalation
@@ -156,16 +235,16 @@ WebPeel tries the fastest method first, then escalates only when needed:
156
235
  │ Smart Escalation │
157
236
  └─────────────────────────────────────────────────────────────┘
158
237
 
159
- Simple HTTP Fetch Browser Rendering Stealth Mode
160
- ~200ms ~2 seconds ~5 seconds
238
+ Simple HTTP FetchBrowser RenderingStealth Mode
239
+ ~200ms ~2 seconds ~5 seconds
161
240
  │ │ │
162
241
  ├─ User-Agent headers ├─ Full JS execution ├─ Anti-detect
163
- ├─ Cheerio parsing ├─ Wait for content ├─ Proxy rotation
164
- ├─ Fast & cheap ├─ Screenshots └─ Cloudflare bypass
165
- │ │
166
- ▼ ▼
167
- Works for 80% Works for 19% Works for 1%
168
- of websites (JS-heavy sites) (heavily protected)
242
+ ├─ Cheerio parsing ├─ Wait for content ├─ Fingerprint mask
243
+ ├─ Fast & cheap ├─ Screenshots ├─ Cloudflare bypass
244
+ │ │
245
+ ▼ ▼
246
+ Works for 80% Works for 15% Works for 5%
247
+ of websites (JS-heavy sites) (bot-protected)
169
248
  ```
170
249
 
171
250
  **Why this matters:**
@@ -244,7 +323,7 @@ await cleanup(); // Close browser instances
244
323
 
245
324
  ## Hosted API
246
325
 
247
- Live at `https://webpeel-api.onrender.com` — or use the CLI locally for free.
326
+ Live at `https://webpeel-api.onrender.com` — authentication required after first 25 fetches.
248
327
 
249
328
  ```bash
250
329
  # Register and get your API key
@@ -257,29 +336,46 @@ curl "https://webpeel-api.onrender.com/v1/fetch?url=https://example.com" \
257
336
  -H "Authorization: Bearer wp_live_your_api_key"
258
337
  ```
259
338
 
260
- ### Pricing
339
+ ### Pricing — Weekly Reset Model
340
+
341
+ Usage resets every **Monday at 00:00 UTC**, just like Claude Code.
342
+
343
+ | Plan | Price | Weekly Fetches | Burst Limit | Stealth Mode | Extra Usage |
344
+ |------|------:|---------------:|:-----------:|:------------:|:-----------:|
345
+ | **Free** | $0 | 125/wk (~500/mo) | 25/hr | ❌ | ❌ |
346
+ | **Pro** | $9/mo | 1,250/wk (~5K/mo) | 100/hr | ✅ | ✅ |
347
+ | **Max** | $29/mo | 6,250/wk (~25K/mo) | 500/hr | ✅ | ✅ |
348
+
349
+ **Three layers of usage control:**
350
+ 1. **Burst limit** — Per-hour cap (25/hr free, 100/hr Pro, 500/hr Max) prevents hammering
351
+ 2. **Weekly limit** — Main usage gate, resets every Monday
352
+ 3. **Extra usage** — When you hit your weekly limit, keep fetching at pay-as-you-go rates
261
353
 
262
- | Plan | Price | Fetches/Month | JS Rendering | Key Features |
263
- |------|------:|---------------:|:------------:|----------|
264
- | **Local CLI** | $0 | ∞ Unlimited | ✅ | Full power, your machine |
265
- | **Cloud Free** | $0 | 500 | ❌ | Soft limits — never blocked |
266
- | **Cloud Pro** | $9/mo | 5,000 | ✅ | Credit rollover, soft limits |
267
- | **Cloud Max** | $29/mo | 25,000 | ✅ | Priority queue, credit rollover |
354
+ **Extra usage rates (Pro/Max only):**
355
+ | Fetch Type | Cost |
356
+ |-----------|------|
357
+ | Basic (HTTP) | $0.002 |
358
+ | Stealth (browser) | $0.01 |
359
+ | Search | $0.001 |
268
360
 
269
- ### Why WebPeel Pro Beats Firecrawl
361
+ ### Why WebPeel Beats Firecrawl
270
362
 
271
- | Feature | WebPeel Local | WebPeel Pro | Firecrawl Hobby |
363
+ | Feature | WebPeel Free | WebPeel Pro | Firecrawl Hobby |
272
364
  |---------|:-------------:|:-----------:|:---------------:|
273
365
  | **Price** | $0 | $9/mo | $16/mo |
274
- | **Monthly Fetches** | | 5,000 | 3,000 |
275
- | **Credit Rollover** | N/A | ✅ 1 month | ❌ Expire monthly |
276
- | **Soft Limits** | ✅ Always | ✅ Never locked out | ❌ Hard cut-off |
277
- | **Self-Host** | MIT | N/A | ❌ AGPL |
366
+ | **Weekly Fetches** | 125/wk | 1,250/wk | ~750/wk |
367
+ | **Rollover** | | ✅ 1 week | ❌ Expire monthly |
368
+ | **Soft Limits** | ✅ Degrades | ✅ Never locked out | ❌ Hard cut-off |
369
+ | **Extra Usage** | | Pay-as-you-go | ❌ Upgrade only |
370
+ | **Self-Host** | ✅ MIT | ✅ MIT | ❌ AGPL |
278
371
 
279
372
  **Key differentiators:**
280
- - **Soft limits on every tier** — When you hit your limit, we degrade to HTTP-only instead of blocking you. Even free users are never locked out.
281
- - **Credits roll over** — Unused fetches carry forward for 1 month (Firecrawl expires monthly)
282
- - **CLI is always free** — No vendor lock-in. Run unlimited locally forever.
373
+ - **Like Claude Code** — Generous free tier (125/week), pay when you need more
374
+ - **Weekly resets** — Your usage refreshes every Monday, not once a month
375
+ - **Soft limits on every tier** — At 100%, we degrade gracefully instead of blocking you
376
+ - **Extra usage** — Pro/Max users can toggle on pay-as-you-go with spending caps (no surprise bills)
377
+ - **First 25 free** — Try it instantly, no signup required
378
+ - **Open source** — MIT licensed, self-host if you want full control
283
379
 
284
380
  See pricing at [webpeel.dev](https://webpeel.dev/#pricing)
285
381
 
@@ -388,12 +484,19 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
388
484
  - [x] CLI with smart escalation
389
485
  - [x] TypeScript library
390
486
  - [x] MCP server for Claude/Cursor/VS Code
391
- - [ ] Hosted API with authentication
392
- - [ ] Rate limiting and caching
393
- - [ ] Batch processing API
394
- - [ ] Screenshot capture
487
+ - [x] Hosted API with authentication and usage tracking
488
+ - [x] Rate limiting and caching
489
+ - [x] Batch processing API (`batch <file>`)
490
+ - [x] Screenshot capture (`--screenshot`)
491
+ - [x] CSS selector filtering (`--selector`, `--exclude`)
492
+ - [x] DuckDuckGo search (`search <query>`)
493
+ - [x] Custom headers and cookies
494
+ - [x] Weekly reset usage model with extra usage
495
+ - [x] Stealth mode (playwright-extra + anti-detect)
496
+ - [x] Crawl mode (follow links, respect robots.txt)
395
497
  - [ ] PDF extraction
396
498
  - [ ] Webhook notifications for monitoring
499
+ - [ ] AI CAPTCHA solving (planned)
397
500
 
398
501
  Vote on features and roadmap at [GitHub Discussions](https://github.com/JakeLiuMe/webpeel/discussions).
399
502
 
@@ -402,7 +505,7 @@ Vote on features and roadmap at [GitHub Discussions](https://github.com/JakeLiuM
402
505
  ## FAQ
403
506
 
404
507
  **Q: How is this different from Firecrawl?**
405
- A: WebPeel runs locally for free (Firecrawl is cloud-only). We also have smart escalation to avoid burning resources on simple pages.
508
+ A: WebPeel has a more generous free tier (125/week vs Firecrawl's 500 one-time credits) and uses weekly resets like Claude Code. We also have smart escalation to avoid burning resources on simple pages.
406
509
 
407
510
  **Q: Can I self-host the API server?**
408
511
  A: Yes! Run `npm run serve` to start the API server. See [docs/self-hosting.md](docs/self-hosting.md) (coming soon).
@@ -411,10 +514,10 @@ A: Yes! Run `npm run serve` to start the API server. See [docs/self-hosting.md](
411
514
  A: WebPeel is a tool — how you use it is up to you. Always check a site's ToS before fetching at scale. We recommend respecting `robots.txt` in your own workflows.
412
515
 
413
516
  **Q: What about CAPTCHA and Cloudflare?**
414
- A: WebPeel handles most Cloudflare challenges automatically. For CAPTCHAs, you'll need a solving service (not included).
517
+ A: WebPeel handles most Cloudflare challenges automatically via stealth mode. AI-powered CAPTCHA solving is on our roadmap.
415
518
 
416
519
  **Q: Can I use this in production?**
417
- A: Yes, but be mindful of rate limits. The hosted API (coming soon) is better for high-volume production use.
520
+ A: Yes! The hosted API at `https://webpeel-api.onrender.com` is production-ready with authentication, rate limiting, and usage tracking.
418
521
 
419
522
  ---
420
523
 
@@ -0,0 +1,84 @@
1
+ /**
2
+ * CLI Authentication & Usage Tracking
3
+ *
4
+ * Handles:
5
+ * - Anonymous usage (25 free fetches)
6
+ * - API key authentication
7
+ * - Usage checking against API
8
+ * - Config file management (~/.webpeel/config.json)
9
+ */
10
+ interface CLIConfig {
11
+ apiKey?: string;
12
+ anonymousUsage: number;
13
+ lastReset: string;
14
+ planTier?: string;
15
+ planCachedAt?: string;
16
+ }
17
+ interface UsageCheckResult {
18
+ allowed: boolean;
19
+ message?: string;
20
+ isAnonymous?: boolean;
21
+ usageInfo?: {
22
+ used: number;
23
+ limit: number;
24
+ remaining: number;
25
+ };
26
+ }
27
+ /**
28
+ * Load config from ~/.webpeel/config.json
29
+ */
30
+ export declare function loadConfig(): CLIConfig;
31
+ /**
32
+ * Save config to ~/.webpeel/config.json
33
+ */
34
+ export declare function saveConfig(config: CLIConfig): void;
35
+ /**
36
+ * Delete config file
37
+ */
38
+ export declare function deleteConfig(): void;
39
+ /**
40
+ * Check usage quota before making a request
41
+ */
42
+ export declare function checkUsage(): Promise<UsageCheckResult>;
43
+ export type PremiumFeature = 'stealth' | 'crawl' | 'batch';
44
+ /**
45
+ * Check if user has access to a premium feature.
46
+ * Returns { allowed: true } for paid users, or a helpful upgrade message.
47
+ *
48
+ * Priority:
49
+ * 1. No API key → blocked (must sign up)
50
+ * 2. Has API key + cached plan → check plan tier
51
+ * 3. Has API key + no cache → check API, then cache
52
+ * 4. API unreachable + cached plan within 7 days → use cache
53
+ * 5. API unreachable + stale cache → allow gracefully (trust the user)
54
+ */
55
+ export declare function checkFeatureAccess(feature: PremiumFeature): Promise<{
56
+ allowed: boolean;
57
+ message?: string;
58
+ }>;
59
+ /**
60
+ * Show usage footer after successful fetch (for free/anonymous users only)
61
+ */
62
+ export declare function showUsageFooter(usageInfo: {
63
+ used: number;
64
+ limit: number;
65
+ remaining: number;
66
+ } | undefined, isAnonymous: boolean, stealth?: boolean): void;
67
+ /**
68
+ * Prompt user for API key via stdin
69
+ */
70
+ export declare function promptForApiKey(): Promise<string>;
71
+ /**
72
+ * Login command - save API key to config
73
+ */
74
+ export declare function handleLogin(): Promise<void>;
75
+ /**
76
+ * Logout command - remove API key from config
77
+ */
78
+ export declare function handleLogout(): void;
79
+ /**
80
+ * Usage command - show current quota
81
+ */
82
+ export declare function handleUsage(): Promise<void>;
83
+ export {};
84
+ //# sourceMappingURL=cli-auth.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"cli-auth.d.ts","sourceRoot":"","sources":["../src/cli-auth.ts"],"names":[],"mappings":"AAAA;;;;;;;;GAQG;AAcH,UAAU,SAAS;IACjB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,cAAc,EAAE,MAAM,CAAC;IACvB,SAAS,EAAE,MAAM,CAAC;IAClB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;CACvB;AAED,UAAU,gBAAgB;IACxB,OAAO,EAAE,OAAO,CAAC;IACjB,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,SAAS,CAAC,EAAE;QACV,IAAI,EAAE,MAAM,CAAC;QACb,KAAK,EAAE,MAAM,CAAC;QACd,SAAS,EAAE,MAAM,CAAC;KACnB,CAAC;CACH;AAsBD;;GAEG;AACH,wBAAgB,UAAU,IAAI,SAAS,CAyBtC;AAED;;GAEG;AACH,wBAAgB,UAAU,CAAC,MAAM,EAAE,SAAS,GAAG,IAAI,CAWlD;AAED;;GAEG;AACH,wBAAgB,YAAY,IAAI,IAAI,CAQnC;AAwBD;;GAEG;AACH,wBAAsB,UAAU,IAAI,OAAO,CAAC,gBAAgB,CAAC,CA+F5D;AAGD,MAAM,MAAM,cAAc,GAAG,SAAS,GAAG,OAAO,GAAG,OAAO,CAAC;AAQ3D;;;;;;;;;;GAUG;AACH,wBAAsB,kBAAkB,CAAC,OAAO,EAAE,cAAc,GAAG,OAAO,CAAC;IAAE,OAAO,EAAE,OAAO,CAAC;IAAC,OAAO,CAAC,EAAE,MAAM,CAAA;CAAE,CAAC,CAmEjH;AAED;;GAEG;AACH,wBAAgB,eAAe,CAC7B,SAAS,EAAE;IAAE,IAAI,EAAE,MAAM,CAAC;IAAC,KAAK,EAAE,MAAM,CAAC;IAAC,SAAS,EAAE,MAAM,CAAA;CAAE,GAAG,SAAS,EACzE,WAAW,EAAE,OAAO,EACpB,OAAO,GAAE,OAAe,GACvB,IAAI,CAaN;AAED;;GAEG;AACH,wBAAsB,eAAe,IAAI,OAAO,CAAC,MAAM,CAAC,CAYvD;AAED;;GAEG;AACH,wBAAsB,WAAW,IAAI,OAAO,CAAC,IAAI,CAAC,CA8BjD;AAED;;GAEG;AACH,wBAAgB,YAAY,IAAI,IAAI,CAUnC;AAED;;GAEG;AACH,wBAAsB,WAAW,IAAI,OAAO,CAAC,IAAI,CAAC,CAkEjD"}