webpeel 0.17.4 → 0.17.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,4182 @@
1
+ openapi: 3.1.0
2
+
3
+ info:
4
+ title: WebPeel API
5
+ description: |
6
+ **WebPeel** is a high-performance web scraping and data extraction API. It turns any
7
+ website into clean, structured content — markdown, text, JSON, or screenshots — with
8
+ built-in bot detection bypass, JavaScript rendering, and smart token budgeting.
9
+
10
+ ## Authentication
11
+
12
+ All endpoints (except `/health` and `/openapi.yaml`) require an API key. Pass it in
13
+ the `Authorization` header:
14
+
15
+ ```
16
+ Authorization: Bearer wp_YOUR_KEY
17
+ ```
18
+
19
+ API keys start with the `wp_` prefix. Get one free at [app.webpeel.dev](https://app.webpeel.dev).
20
+
21
+ ## Rate Limits
22
+
23
+ Rate limit information is returned in response headers:
24
+
25
+ | Header | Description |
26
+ |---|---|
27
+ | `X-RateLimit-Limit` | Requests allowed per window |
28
+ | `X-RateLimit-Remaining` | Requests remaining in current window |
29
+ | `X-RateLimit-Reset` | Unix timestamp when window resets |
30
+
31
+ When the limit is exceeded, the server responds with `429 Too Many Requests`.
32
+
33
+ ## Caching
34
+
35
+ Fetch and search responses are cached automatically (5 min for fetch, 15 min for search).
36
+ Cache behaviour is communicated via response headers:
37
+
38
+ | Header | Description |
39
+ |---|---|
40
+ | `X-Cache` | `HIT` or `MISS` |
41
+ | `X-Cache-Age` | Age of cached response in seconds |
42
+
43
+ To bypass the cache, pass `?noCache=true` (GET) / `"noCache": true` (POST) or set
44
+ the `Cache-Control: no-cache` request header.
45
+
46
+ ## Errors
47
+
48
+ All errors follow a consistent shape:
49
+
50
+ ```json
51
+ {
52
+ "success": false,
53
+ "error": {
54
+ "type": "invalid_request",
55
+ "message": "Human-readable description of the error.",
56
+ "hint": "Optional suggestion to fix the error.",
57
+ "docs": "https://webpeel.dev/docs/api-reference#errors"
58
+ },
59
+ "metadata": {
60
+ "requestId": "req_abc123"
61
+ }
62
+ }
63
+ ```
64
+
65
+ ### Error Codes
66
+
67
+ | Code | HTTP Status | Description |
68
+ |---|---|---|
69
+ | `invalid_request` | 400 | Missing or malformed parameters |
70
+ | `invalid_url` | 400 | URL is malformed or too long |
71
+ | `forbidden_url` | 400 | SSRF protection: private/localhost URLs not allowed |
72
+ | `unauthorized` | 401 | No API key provided |
73
+ | `invalid_api_key` | 401 | API key is invalid or revoked |
74
+ | `quota_exceeded` | 429 | Weekly request quota exceeded |
75
+ | `rate_limited` | 429 | Too many requests in time window |
76
+ | `TIMEOUT` | 504 | Request timed out |
77
+ | `BLOCKED` | 403 | Target site is blocking automated requests |
78
+ | `NETWORK` | 502 | Could not reach the target URL |
79
+ | `internal_error` | 500 | Unexpected server error |
80
+
81
+ version: 0.15.2
82
+ contact:
83
+ name: WebPeel Support
84
+ url: https://webpeel.dev
85
+ email: hello@webpeel.dev
86
+ license:
87
+ name: MIT
88
+ url: https://opensource.org/licenses/MIT
89
+ x-logo:
90
+ url: https://webpeel.dev/logo.png
91
+ altText: WebPeel
92
+
93
+ servers:
94
+ - url: https://api.webpeel.dev
95
+ description: Production
96
+ - url: http://localhost:3000
97
+ description: Local development
98
+
99
+ security:
100
+ - BearerAuth: []
101
+
102
+ tags:
103
+ - name: Health
104
+ description: Server health and status checks. No authentication required.
105
+ - name: Fetch
106
+ description: |
107
+ Fetch any URL and return clean, structured content. The core WebPeel operation.
108
+ Supports markdown, text, and HTML output; browser rendering; JavaScript execution;
109
+ bot detection bypass; screenshots; and smart token budgeting.
110
+ - name: Search
111
+ description: |
112
+ Search the web and return structured results with titles, URLs, and snippets.
113
+ Supports DuckDuckGo (free, no key) and Brave Search (BYOK for higher quality).
114
+ - name: Crawl
115
+ description: |
116
+ Crawl a website starting from a URL up to a configurable depth and page limit.
117
+ Jobs run asynchronously — poll the job status endpoint to retrieve results.
118
+ - name: Map
119
+ description: |
120
+ Discover all URLs on a domain via sitemap parsing and link crawling.
121
+ Returns a flat list of URLs found on the site.
122
+ - name: Extract
123
+ description: |
124
+ Extract structured data from a URL using JSON Schema + LLM (BYOK) or
125
+ auto-detection heuristics (no LLM key required).
126
+ - name: Answer
127
+ description: |
128
+ LLM-free question answering on a URL using BM25 relevance scoring.
129
+ No external API key required.
130
+ - name: YouTube
131
+ description: Extract full transcripts and metadata from YouTube videos.
132
+ - name: Screenshot
133
+ description: Capture full-page or viewport screenshots of any URL.
134
+ - name: Batch
135
+ description: |
136
+ Submit multiple URLs for concurrent scraping. Jobs are processed asynchronously
137
+ and results can be retrieved via the jobs API or delivered to a webhook.
138
+ - name: Research
139
+ description: |
140
+ Multi-source deep research: search the web, fetch top results, and synthesise
141
+ into a merged or structured report in a single API call.
142
+ - name: Watch
143
+ description: |
144
+ Monitor URLs for content changes. Create watchers with configurable check intervals
145
+ and receive notifications via webhooks when changes are detected.
146
+ - name: Auth
147
+ description: |
148
+ User registration, login, and API key management. These endpoints use JWT
149
+ tokens for session management, separate from the API key auth used by all
150
+ other endpoints.
151
+ - name: MCP
152
+ description: |
153
+ Model Context Protocol (MCP) endpoints for AI assistant integration.
154
+ Connect Claude, GPT-4, and other AI tools directly to WebPeel's capabilities.
155
+ Uses [MCP Streamable HTTP transport](https://modelcontextprotocol.io/) (JSON-RPC over HTTP).
156
+
157
+ components:
158
+ securitySchemes:
159
+ BearerAuth:
160
+ type: http
161
+ scheme: bearer
162
+ bearerFormat: "wp_<key>"
163
+ description: |
164
+ WebPeel API key. All keys start with the `wp_` prefix.
165
+ Get one free at [app.webpeel.dev](https://app.webpeel.dev).
166
+
167
+ schemas:
168
+ # -------------------------------------------------------------------------
169
+ # Core result types
170
+ # -------------------------------------------------------------------------
171
+
172
+ PeelResult:
173
+ type: object
174
+ description: The result of a fetch or scrape operation.
175
+ required: [url, title, content, metadata, links, tokens, method, elapsed]
176
+ properties:
177
+ url:
178
+ type: string
179
+ format: uri
180
+ description: Final URL after any redirects.
181
+ example: https://example.com/page
182
+ title:
183
+ type: string
184
+ description: Page title extracted from `<title>` or Open Graph tags.
185
+ example: "Example Domain"
186
+ content:
187
+ type: string
188
+ description: Page content in the requested format (markdown, text, or HTML).
189
+ example: "# Example Domain\n\nThis domain is for use in illustrative examples..."
190
+ metadata:
191
+ $ref: "#/components/schemas/PageMetadata"
192
+ links:
193
+ type: array
194
+ items:
195
+ type: string
196
+ format: uri
197
+ description: All unique links found on the page (absolute URLs).
198
+ example: ["https://www.iana.org/domains/example"]
199
+ tokens:
200
+ type: integer
201
+ description: Estimated token count (content.length / 4, rough estimate).
202
+ example: 42
203
+ method:
204
+ type: string
205
+ enum: [simple, browser, stealth]
206
+ description: Fetch method used.
207
+ example: simple
208
+ elapsed:
209
+ type: integer
210
+ description: Time elapsed in milliseconds.
211
+ example: 234
212
+ screenshot:
213
+ type: string
214
+ description: Base64-encoded PNG screenshot. Only present when `screenshot=true`.
215
+ example: "iVBORw0KGgoAAAANSUhEUgAA..."
216
+ contentType:
217
+ type: string
218
+ description: Content type detected (html, json, xml, text, rss, etc.).
219
+ example: html
220
+ quality:
221
+ type: number
222
+ format: float
223
+ minimum: 0
224
+ maximum: 1
225
+ description: Content quality score — how clean the extraction was.
226
+ example: 0.87
227
+ fingerprint:
228
+ type: string
229
+ description: SHA256 hash of content (first 16 chars) for change detection.
230
+ example: "a1b2c3d4e5f6g7h8"
231
+ extracted:
232
+ type: object
233
+ additionalProperties: true
234
+ description: Structured data from CSS selector or heuristic extraction.
235
+ json:
236
+ type: object
237
+ additionalProperties: true
238
+ description: Structured JSON from inline LLM extraction (BYOK).
239
+ images:
240
+ type: array
241
+ items:
242
+ $ref: "#/components/schemas/ImageInfo"
243
+ description: Extracted images. Only present when `images=true`.
244
+ prunedPercent:
245
+ type: integer
246
+ description: Percentage of HTML pruned by content density scoring (0–100).
247
+ example: 43
248
+ readability:
249
+ type: object
250
+ description: Reader mode metadata. Only present when `readable=true`.
251
+ properties:
252
+ title:
253
+ type: string
254
+ author:
255
+ type: string
256
+ publishDate:
257
+ type: string
258
+ readingTime:
259
+ type: integer
260
+ description: Estimated reading time in minutes.
261
+ wordCount:
262
+ type: integer
263
+ excerpt:
264
+ type: string
265
+ linkCount:
266
+ type: integer
267
+ description: Number of unique links found on the page.
268
+ example: 12
269
+ quickAnswer:
270
+ type: object
271
+ description: BM25 quick answer result. Only present when `question` is set.
272
+ properties:
273
+ question:
274
+ type: string
275
+ answer:
276
+ type: string
277
+ confidence:
278
+ type: number
279
+ passages:
280
+ type: array
281
+ items:
282
+ type: string
283
+ method:
284
+ type: string
285
+ timing:
286
+ type: object
287
+ description: Per-stage timing breakdown in milliseconds.
288
+ additionalProperties:
289
+ type: integer
290
+ freshness:
291
+ type: object
292
+ description: Content freshness metadata from HTTP response headers.
293
+ properties:
294
+ lastModified:
295
+ type: string
296
+ etag:
297
+ type: string
298
+ fetchedAt:
299
+ type: string
300
+ format: date-time
301
+ cacheControl:
302
+ type: string
303
+ warning:
304
+ type: string
305
+ description: Warning when content may be incomplete or degraded.
306
+
307
+ PageMetadata:
308
+ type: object
309
+ description: Metadata extracted from the page.
310
+ properties:
311
+ description:
312
+ type: string
313
+ description: Meta description.
314
+ example: "An example page for documentation."
315
+ author:
316
+ type: string
317
+ description: Author name.
318
+ example: "Jane Doe"
319
+ published:
320
+ type: string
321
+ format: date-time
322
+ description: Published date (ISO 8601).
323
+ image:
324
+ type: string
325
+ format: uri
326
+ description: Open Graph image URL.
327
+ canonical:
328
+ type: string
329
+ format: uri
330
+ description: Canonical URL.
331
+ contentType:
332
+ type: string
333
+ description: MIME content type (set for PDF, DOCX, etc.).
334
+ wordCount:
335
+ type: integer
336
+ description: Word count.
337
+ language:
338
+ type: string
339
+ description: "Page language (e.g., \"en\", \"en-US\")."
340
+ example: en
341
+
342
+ QuickAnswerResult:
343
+ type: object
344
+ description: Result from LLM-free BM25 question answering on a URL.
345
+ required: [url, question, answer, confidence, passages, method]
346
+ properties:
347
+ url:
348
+ type: string
349
+ format: uri
350
+ description: Final URL after any redirects.
351
+ example: https://example.com/pricing
352
+ title:
353
+ type: string
354
+ description: Page title.
355
+ example: "Pricing – Example"
356
+ question:
357
+ type: string
358
+ description: The question that was asked.
359
+ example: "What is the price of the pro plan?"
360
+ answer:
361
+ type: string
362
+ description: Best answer extracted from the page content using BM25 ranking.
363
+ example: "The Pro plan costs $49 per month."
364
+ confidence:
365
+ type: number
366
+ format: float
367
+ minimum: 0
368
+ maximum: 1
369
+ description: Confidence score for the answer (0–1).
370
+ example: 0.83
371
+ passages:
372
+ type: array
373
+ items:
374
+ type: string
375
+ description: Top-ranked text passages relevant to the question.
376
+ example:
377
+ - "Pro plan: $49/month — includes unlimited projects and priority support."
378
+ - "All plans include a 14-day free trial."
379
+ source:
380
+ type: string
381
+ format: uri
382
+ description: URL the answer was extracted from.
383
+ example: https://example.com/pricing
384
+ method:
385
+ type: string
386
+ description: Scoring method used (always `bm25`).
387
+ example: bm25
388
+
389
+ SearchResult:
390
+ type: object
391
+ description: A single search result entry.
392
+ required: [title, url, snippet]
393
+ properties:
394
+ title:
395
+ type: string
396
+ description: Page title.
397
+ example: "Latest AI News – TechCrunch"
398
+ url:
399
+ type: string
400
+ format: uri
401
+ description: Page URL.
402
+ example: https://techcrunch.com/2025/01/01/latest-ai-news
403
+ snippet:
404
+ type: string
405
+ description: Short description or excerpt.
406
+ example: "Here's everything you need to know about the latest developments in AI..."
407
+ content:
408
+ type: string
409
+ description: Full page content in markdown. Only present when `scrapeResults=true`.
410
+
411
+ SearchWebResult:
412
+ $ref: "#/components/schemas/SearchResult"
413
+
414
+ SearchImageResult:
415
+ type: object
416
+ required: [title, url, thumbnail, source]
417
+ properties:
418
+ title:
419
+ type: string
420
+ url:
421
+ type: string
422
+ format: uri
423
+ thumbnail:
424
+ type: string
425
+ format: uri
426
+ source:
427
+ type: string
428
+
429
+ SearchNewsResult:
430
+ type: object
431
+ required: [title, url, snippet, source]
432
+ properties:
433
+ title:
434
+ type: string
435
+ url:
436
+ type: string
437
+ format: uri
438
+ snippet:
439
+ type: string
440
+ source:
441
+ type: string
442
+ date:
443
+ type: string
444
+
445
+ ImageInfo:
446
+ type: object
447
+ description: Information about an image found on the page.
448
+ required: [src, alt]
449
+ properties:
450
+ src:
451
+ type: string
452
+ format: uri
453
+ description: Absolute URL of the image.
454
+ alt:
455
+ type: string
456
+ description: Alt text.
457
+ title:
458
+ type: string
459
+ description: Title attribute.
460
+ width:
461
+ type: integer
462
+ height:
463
+ type: integer
464
+
465
+ PageAction:
466
+ type: object
467
+ description: A browser interaction to perform before extracting content.
468
+ required: [type]
469
+ properties:
470
+ type:
471
+ type: string
472
+ enum: [click, type, fill, scroll, wait, press, hover, select, waitForSelector, screenshot]
473
+ description: Action type.
474
+ selector:
475
+ type: string
476
+ description: CSS selector for element-targeted actions.
477
+ example: "#search-input"
478
+ value:
479
+ type: string
480
+ description: Value/text for type, fill, or select actions.
481
+ example: "hello world"
482
+ text:
483
+ type: string
484
+ description: Alias for `value` (Firecrawl compatibility).
485
+ key:
486
+ type: string
487
+ description: Keyboard key for press actions.
488
+ example: "Enter"
489
+ ms:
490
+ type: integer
491
+ description: Wait duration in milliseconds (for wait actions).
492
+ example: 1000
493
+ milliseconds:
494
+ type: integer
495
+ description: Alias for `ms` (Firecrawl compatibility).
496
+ direction:
497
+ type: string
498
+ enum: [up, down, left, right]
499
+ description: Scroll direction.
500
+ amount:
501
+ type: integer
502
+ description: Scroll amount in pixels.
503
+ example: 500
504
+ timeout:
505
+ type: integer
506
+ description: Per-action timeout override in milliseconds.
507
+
508
+ InlineExtract:
509
+ type: object
510
+ description: Inline LLM-powered JSON extraction configuration (BYOK).
511
+ properties:
512
+ schema:
513
+ type: object
514
+ additionalProperties: true
515
+ description: JSON Schema describing the desired output structure.
516
+ prompt:
517
+ type: string
518
+ description: Natural language extraction prompt.
519
+ example: "Extract the product name, price, and availability."
520
+
521
+ WatchEntry:
522
+ type: object
523
+ description: A URL watcher entry.
524
+ required: [id, accountId, url, status, checkIntervalMinutes, createdAt]
525
+ properties:
526
+ id:
527
+ type: string
528
+ description: Unique watcher ID.
529
+ example: "watch_a1b2c3d4"
530
+ accountId:
531
+ type: string
532
+ format: uuid
533
+ description: Account that owns this watcher.
534
+ url:
535
+ type: string
536
+ format: uri
537
+ description: The URL being monitored.
538
+ example: https://example.com/pricing
539
+ webhookUrl:
540
+ type: string
541
+ format: uri
542
+ description: Webhook URL to notify when changes are detected.
543
+ example: https://your-server.com/webhook
544
+ checkIntervalMinutes:
545
+ type: integer
546
+ description: How often to check for changes (1–44640 minutes).
547
+ example: 60
548
+ selector:
549
+ type: string
550
+ description: CSS selector to limit change detection to a specific element.
551
+ example: ".price"
552
+ status:
553
+ type: string
554
+ enum: [active, paused]
555
+ description: Whether the watcher is active or paused.
556
+ example: active
557
+ lastCheckedAt:
558
+ type: string
559
+ format: date-time
560
+ description: When the URL was last checked.
561
+ lastChangeAt:
562
+ type: string
563
+ format: date-time
564
+ description: When the last change was detected.
565
+ createdAt:
566
+ type: string
567
+ format: date-time
568
+ description: When the watcher was created.
569
+
570
+ YouTubeSegment:
571
+ type: object
572
+ description: A single timed transcript segment.
573
+ required: [start, end, text]
574
+ properties:
575
+ start:
576
+ type: number
577
+ description: Start time in seconds.
578
+ example: 12.5
579
+ end:
580
+ type: number
581
+ description: End time in seconds.
582
+ example: 15.2
583
+ text:
584
+ type: string
585
+ description: Transcript text for this segment.
586
+ example: "Welcome to the tutorial."
587
+
588
+ # -------------------------------------------------------------------------
589
+ # Request bodies
590
+ # -------------------------------------------------------------------------
591
+
592
+ FetchPostBody:
593
+ type: object
594
+ description: Request body for POST /v1/fetch.
595
+ required: [url]
596
+ properties:
597
+ url:
598
+ type: string
599
+ format: uri
600
+ description: The URL to fetch. Must be HTTP or HTTPS. Max 2048 characters.
601
+ example: https://example.com
602
+ render:
603
+ type: boolean
604
+ default: false
605
+ description: Use headless browser rendering for JavaScript-heavy sites.
606
+ wait:
607
+ type: integer
608
+ minimum: 0
609
+ maximum: 60000
610
+ description: Milliseconds to wait after page load (only with render=true).
611
+ example: 2000
612
+ format:
613
+ type: string
614
+ enum: [markdown, text, html]
615
+ default: markdown
616
+ description: Output format.
617
+ stream:
618
+ type: boolean
619
+ default: false
620
+ description: Prepare a streaming response.
621
+ includeTags:
622
+ type: array
623
+ items:
624
+ type: string
625
+ description: Only include content from these HTML elements.
626
+ example: ["article", "main", ".content"]
627
+ excludeTags:
628
+ type: array
629
+ items:
630
+ type: string
631
+ description: Remove these HTML elements from the output.
632
+ example: ["nav", "footer", "header", ".sidebar"]
633
+ images:
634
+ type: boolean
635
+ default: false
636
+ description: Extract image URLs and alt text.
637
+ location:
638
+ type: string
639
+ description: ISO 3166-1 alpha-2 country code for geo-targeted content.
640
+ example: US
641
+ languages:
642
+ type: array
643
+ items:
644
+ type: string
645
+ description: Language preferences for browser rendering.
646
+ example: ["en-US"]
647
+ onlyMainContent:
648
+ type: boolean
649
+ default: false
650
+ description: Shortcut to include only main/article content (strips boilerplate).
651
+ actions:
652
+ type: array
653
+ items:
654
+ $ref: "#/components/schemas/PageAction"
655
+ description: Browser interactions to execute before extracting content. Auto-enables rendering.
656
+ noCache:
657
+ type: boolean
658
+ default: false
659
+ description: Bypass cache and force a fresh fetch.
660
+ cacheTtl:
661
+ type: integer
662
+ description: Cache TTL in seconds for this response. Default is 300 (5 minutes).
663
+ example: 600
664
+ budget:
665
+ type: integer
666
+ description: |
667
+ Smart token budget — distill content to fit within N tokens using heuristic
668
+ compression (no LLM needed). Default: 4000. Set to 0 to disable.
669
+ example: 4000
670
+ question:
671
+ type: string
672
+ description: Ask a question about the content. Uses BM25 relevance scoring (no LLM key needed).
673
+ example: "What is the pricing?"
674
+ readable:
675
+ type: boolean
676
+ default: false
677
+ description: Reader mode — extract only the main article, strip all noise.
678
+ stealth:
679
+ type: boolean
680
+ default: false
681
+ description: Stealth mode for bot-protected sites (Amazon, LinkedIn, etc.).
682
+ screenshot:
683
+ type: boolean
684
+ default: false
685
+ description: Also capture a screenshot (returned as base64 PNG in `screenshot` field).
686
+ maxTokens:
687
+ type: integer
688
+ description: Maximum token count — hard truncation (vs `budget` which is smart compression).
689
+ example: 8000
690
+ selector:
691
+ type: string
692
+ description: CSS selector to extract specific content only.
693
+ example: "article.post"
694
+ exclude:
695
+ type: array
696
+ items:
697
+ type: string
698
+ description: CSS selectors to exclude from output.
699
+ example: [".ads", ".sidebar"]
700
+ fullPage:
701
+ type: boolean
702
+ default: false
703
+ description: Disable smart pruning and return the full page content.
704
+ raw:
705
+ type: boolean
706
+ default: false
707
+ description: Skip smart content extraction — return full HTML without stripping boilerplate.
708
+ lite:
709
+ type: boolean
710
+ default: false
711
+ description: Lite mode — minimal processing for maximum speed (~50% faster).
712
+ timeout:
713
+ type: integer
714
+ description: Request timeout in milliseconds. Default is 30000.
715
+ example: 30000
716
+ extract:
717
+ $ref: "#/components/schemas/InlineExtract"
718
+ llmProvider:
719
+ type: string
720
+ enum: [openai, anthropic, google]
721
+ description: LLM provider for inline extraction (required when `extract` is set).
722
+ llmApiKey:
723
+ type: string
724
+ description: LLM API key (BYOK, required when `extract` is set).
725
+ example: "sk-..."
726
+ llmModel:
727
+ type: string
728
+ description: LLM model name (optional, uses provider default).
729
+ example: "gpt-4o-mini"
730
+ formats:
731
+ type: array
732
+ description: Firecrawl-compatible formats array. Use `{ "type": "json", "schema": {...} }` for extraction.
733
+ items:
734
+ oneOf:
735
+ - type: string
736
+ enum: [markdown, html, text, json]
737
+ - type: object
738
+ properties:
739
+ type:
740
+ type: string
741
+ enum: [json]
742
+ schema:
743
+ type: object
744
+ additionalProperties: true
745
+ prompt:
746
+ type: string
747
+
748
+ # -------------------------------------------------------------------------
749
+ # Error response
750
+ # -------------------------------------------------------------------------
751
+
752
+ Error:
753
+ type: object
754
+ description: Standard error response envelope.
755
+ required: [success, error]
756
+ properties:
757
+ success:
758
+ type: boolean
759
+ const: false
760
+ error:
761
+ type: object
762
+ required: [type, message]
763
+ properties:
764
+ type:
765
+ type: string
766
+ description: Machine-readable error code.
767
+ example: invalid_request
768
+ enum:
769
+ - invalid_request
770
+ - invalid_url
771
+ - forbidden_url
772
+ - unauthorized
773
+ - rate_limited
774
+ - TIMEOUT
775
+ - BLOCKED
776
+ - NETWORK
777
+ - internal_error
778
+ message:
779
+ type: string
780
+ description: Human-readable error description.
781
+ example: "Missing or invalid \"url\" parameter."
782
+ hint:
783
+ type: string
784
+ description: Optional suggestion to fix the error.
785
+ example: "Try increasing timeout with ?wait=10000."
786
+ docs:
787
+ type: string
788
+ format: uri
789
+ description: Link to relevant documentation.
790
+ example: https://webpeel.dev/docs/api-reference#errors
791
+ metadata:
792
+ type: object
793
+ properties:
794
+ requestId:
795
+ type: string
796
+ description: Unique request ID for support and debugging.
797
+ example: req_abc123
798
+
799
+ ErrorResponse:
800
+ $ref: "#/components/schemas/Error"
801
+
802
+ # -------------------------------------------------------------------------
803
+ # Auth types
804
+ # -------------------------------------------------------------------------
805
+
806
+ UserObject:
807
+ type: object
808
+ properties:
809
+ id:
810
+ type: string
811
+ format: uuid
812
+ email:
813
+ type: string
814
+ format: email
815
+ tier:
816
+ type: string
817
+ enum: [free, starter, pro, enterprise]
818
+ example: free
819
+ weeklyLimit:
820
+ type: integer
821
+ description: Weekly request quota.
822
+ example: 500
823
+ burstLimit:
824
+ type: integer
825
+ description: Burst rate limit (requests per minute).
826
+ example: 50
827
+ rateLimit:
828
+ type: integer
829
+ description: Hourly rate limit.
830
+ example: 10
831
+ createdAt:
832
+ type: string
833
+ format: date-time
834
+
835
+ # -------------------------------------------------------------------------
836
+ # Screenshot types
837
+ # -------------------------------------------------------------------------
838
+
839
+ ScreenshotResult:
840
+ type: object
841
+ required: [success, data]
842
+ properties:
843
+ success:
844
+ type: boolean
845
+ const: true
846
+ data:
847
+ type: object
848
+ required: [url, screenshot, metadata]
849
+ properties:
850
+ url:
851
+ type: string
852
+ format: uri
853
+ description: Final URL after any redirects.
854
+ screenshot:
855
+ type: string
856
+ description: Data URL containing base64-encoded screenshot image (e.g., `data:image/png;base64,...`).
857
+ example: "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."
858
+ metadata:
859
+ type: object
860
+ properties:
861
+ sourceURL:
862
+ type: string
863
+ format: uri
864
+ format:
865
+ type: string
866
+ enum: [png, jpeg]
867
+ width:
868
+ type: integer
869
+ example: 1280
870
+ height:
871
+ type: integer
872
+ example: 720
873
+ fullPage:
874
+ type: boolean
875
+
876
+ # -------------------------------------------------------------------------
877
+ # Batch types
878
+ # -------------------------------------------------------------------------
879
+
880
+ BatchJobResponse:
881
+ type: object
882
+ required: [success, id]
883
+ properties:
884
+ success:
885
+ type: boolean
886
+ const: true
887
+ id:
888
+ type: string
889
+ description: Job ID for polling status.
890
+ example: "job_a1b2c3d4"
891
+ url:
892
+ type: string
893
+ description: URL to poll for job status.
894
+ example: "/v1/batch/scrape/job_a1b2c3d4"
895
+
896
+ BatchJobStatus:
897
+ type: object
898
+ required: [success, status]
899
+ properties:
900
+ success:
901
+ type: boolean
902
+ const: true
903
+ status:
904
+ type: string
905
+ enum: [queued, processing, completed, failed, cancelled]
906
+ example: completed
907
+ total:
908
+ type: integer
909
+ description: Total number of URLs in the batch.
910
+ example: 10
911
+ completed:
912
+ type: integer
913
+ description: Number of URLs processed so far.
914
+ example: 10
915
+ creditsUsed:
916
+ type: integer
917
+ description: Number of credits consumed so far.
918
+ example: 10
919
+ data:
920
+ type: array
921
+ items:
922
+ $ref: "#/components/schemas/PeelResult"
923
+ description: Scrape results. Populated when status is `completed`.
924
+ error:
925
+ type: string
926
+ description: Error message if status is `failed`.
927
+ expiresAt:
928
+ type: string
929
+ format: date-time
930
+ description: When job data expires and is deleted.
931
+
932
+ # ---------------------------------------------------------------------------
933
+ # Reusable response headers
934
+ # ---------------------------------------------------------------------------
935
+
936
+ headers:
937
+ X-Cache:
938
+ schema:
939
+ type: string
940
+ enum: [HIT, MISS]
941
+ description: Whether the response was served from cache.
942
+ X-Cache-Age:
943
+ schema:
944
+ type: integer
945
+ description: Age of the cached response in seconds.
946
+ X-Credits-Used:
947
+ schema:
948
+ type: integer
949
+ description: Number of credits consumed by this request.
950
+ X-Processing-Time:
951
+ schema:
952
+ type: integer
953
+ description: Server-side processing time in milliseconds.
954
+ X-Request-Id:
955
+ schema:
956
+ type: string
957
+ description: Unique request identifier for debugging and support.
958
+ X-Fetch-Type:
959
+ schema:
960
+ type: string
961
+ enum: [basic, stealth, search, screenshot]
962
+ description: Fetch method used internally.
963
+ X-Auto-Budget:
964
+ schema:
965
+ type: integer
966
+ description: "Token budget automatically applied (default: 4000). Present only when auto-applied."
967
+ X-Degraded:
968
+ schema:
969
+ type: string
970
+ description: Reason for request degradation (e.g., quota exceeded).
971
+ X-RateLimit-Limit:
972
+ schema:
973
+ type: integer
974
+ description: Requests allowed per rate-limit window.
975
+ X-RateLimit-Remaining:
976
+ schema:
977
+ type: integer
978
+ description: Requests remaining in current rate-limit window.
979
+ X-RateLimit-Reset:
980
+ schema:
981
+ type: integer
982
+ description: Unix timestamp when the rate-limit window resets.
983
+
984
+ # ---------------------------------------------------------------------------
985
+ # Reusable responses
986
+ # ---------------------------------------------------------------------------
987
+
988
+ responses:
989
+ Unauthorized:
990
+ description: Missing or invalid API key.
991
+ content:
992
+ application/json:
993
+ schema:
994
+ $ref: "#/components/schemas/Error"
995
+ example:
996
+ success: false
997
+ error:
998
+ type: unauthorized
999
+ message: "API key required. Get one free at https://app.webpeel.dev"
1000
+ docs: "https://webpeel.dev/docs/api-reference#authentication"
1001
+ metadata:
1002
+ requestId: "req_abc123"
1003
+
1004
+ RateLimited:
1005
+ description: Rate limit exceeded.
1006
+ headers:
1007
+ X-RateLimit-Limit:
1008
+ $ref: "#/components/headers/X-RateLimit-Limit"
1009
+ X-RateLimit-Remaining:
1010
+ $ref: "#/components/headers/X-RateLimit-Remaining"
1011
+ X-RateLimit-Reset:
1012
+ $ref: "#/components/headers/X-RateLimit-Reset"
1013
+ Retry-After:
1014
+ schema:
1015
+ type: integer
1016
+ description: Seconds until the rate limit resets.
1017
+ content:
1018
+ application/json:
1019
+ schema:
1020
+ $ref: "#/components/schemas/Error"
1021
+ example:
1022
+ success: false
1023
+ error:
1024
+ type: rate_limited
1025
+ message: "Too many requests. Please wait before retrying."
1026
+ hint: "Upgrade your plan for higher limits."
1027
+ metadata:
1028
+ requestId: "req_abc123"
1029
+
1030
+ InternalError:
1031
+ description: Unexpected server error.
1032
+ content:
1033
+ application/json:
1034
+ schema:
1035
+ $ref: "#/components/schemas/Error"
1036
+ example:
1037
+ success: false
1038
+ error:
1039
+ type: internal_error
1040
+ message: "An unexpected error occurred. If this persists, check https://webpeel.dev/status"
1041
+ metadata:
1042
+ requestId: "req_abc123"
1043
+
1044
+ paths:
1045
+ # ===========================================================================
1046
+ # Health
1047
+ # ===========================================================================
1048
+
1049
+ /health:
1050
+ get:
1051
+ operationId: getHealth
1052
+ summary: Health check
1053
+ description: |
1054
+ Returns the current health status, version, and uptime of the server.
1055
+ No authentication required. Used by load balancers and monitoring tools.
1056
+ tags: [Health]
1057
+ security: []
1058
+ responses:
1059
+ "200":
1060
+ description: Server is healthy.
1061
+ content:
1062
+ application/json:
1063
+ schema:
1064
+ type: object
1065
+ required: [status, version, uptime, timestamp]
1066
+ properties:
1067
+ status:
1068
+ type: string
1069
+ const: healthy
1070
+ example: healthy
1071
+ version:
1072
+ type: string
1073
+ description: Server version (semver).
1074
+ example: 0.15.2
1075
+ uptime:
1076
+ type: integer
1077
+ description: Server uptime in seconds.
1078
+ example: 3600
1079
+ timestamp:
1080
+ type: string
1081
+ format: date-time
1082
+ description: Current server time (ISO 8601).
1083
+ example: "2025-02-25T08:00:00.000Z"
1084
+ example:
1085
+ status: healthy
1086
+ version: "0.15.2"
1087
+ uptime: 3600
1088
+ timestamp: "2025-02-25T08:00:00.000Z"
1089
+
1090
+ # ===========================================================================
1091
+ # OpenAPI Spec
1092
+ # ===========================================================================
1093
+
1094
+ /openapi.yaml:
1095
+ get:
1096
+ operationId: getOpenApiSpec
1097
+ summary: OpenAPI specification
1098
+ description: Returns this OpenAPI 3.1.0 specification in YAML format.
1099
+ tags: [Health]
1100
+ security: []
1101
+ responses:
1102
+ "200":
1103
+ description: The OpenAPI specification.
1104
+ content:
1105
+ application/yaml:
1106
+ schema:
1107
+ type: string
1108
+
1109
+ # ===========================================================================
1110
+ # Fetch
1111
+ # ===========================================================================
1112
+
1113
+ /v1/fetch:
1114
+ get:
1115
+ operationId: fetchUrlGet
1116
+ summary: Fetch a URL (GET)
1117
+ description: |
1118
+ Fetch any URL and return clean content in markdown, text, or HTML format.
1119
+
1120
+ **Quick start:**
1121
+ ```bash
1122
+ curl "https://api.webpeel.dev/v1/fetch?url=https://example.com" \
1123
+ -H "Authorization: Bearer wp_YOUR_KEY"
1124
+ ```
1125
+
1126
+ **With browser rendering:**
1127
+ ```bash
1128
+ curl "https://api.webpeel.dev/v1/fetch?url=https://spa-example.com&render=true" \
1129
+ -H "Authorization: Bearer wp_YOUR_KEY"
1130
+ ```
1131
+
1132
+ Results are cached for 5 minutes by default. Pass `noCache=true` to bypass.
1133
+ tags: [Fetch]
1134
+ parameters:
1135
+ - name: url
1136
+ in: query
1137
+ required: true
1138
+ description: The URL to fetch. Must be HTTP or HTTPS. Max 2048 characters.
1139
+ schema:
1140
+ type: string
1141
+ format: uri
1142
+ example: https://example.com
1143
+ - name: render
1144
+ in: query
1145
+ schema:
1146
+ type: boolean
1147
+ default: false
1148
+ description: Use headless browser for JavaScript-heavy sites or SPAs.
1149
+ - name: wait
1150
+ in: query
1151
+ schema:
1152
+ type: integer
1153
+ minimum: 0
1154
+ maximum: 60000
1155
+ description: Milliseconds to wait after page load (used with `render=true`).
1156
+ example: 2000
1157
+ - name: format
1158
+ in: query
1159
+ schema:
1160
+ type: string
1161
+ enum: [markdown, text, html]
1162
+ default: markdown
1163
+ description: Output format.
1164
+ - name: includeTags
1165
+ in: query
1166
+ schema:
1167
+ type: string
1168
+ description: Comma-separated HTML elements to include (e.g., `article,main`).
1169
+ example: "article,main"
1170
+ - name: excludeTags
1171
+ in: query
1172
+ schema:
1173
+ type: string
1174
+ description: Comma-separated HTML elements to remove (e.g., `nav,footer`).
1175
+ example: "nav,footer,header"
1176
+ - name: images
1177
+ in: query
1178
+ schema:
1179
+ type: boolean
1180
+ default: false
1181
+ description: Include image URLs and alt text in the response.
1182
+ - name: location
1183
+ in: query
1184
+ schema:
1185
+ type: string
1186
+ description: ISO 3166-1 alpha-2 country code for geo-targeted content.
1187
+ example: US
1188
+ - name: languages
1189
+ in: query
1190
+ schema:
1191
+ type: string
1192
+ description: Comma-separated language preferences (e.g., `en-US,de`).
1193
+ - name: onlyMainContent
1194
+ in: query
1195
+ schema:
1196
+ type: boolean
1197
+ default: false
1198
+ description: Shortcut to strip boilerplate and return only main/article content.
1199
+ - name: actions
1200
+ in: query
1201
+ schema:
1202
+ type: string
1203
+ description: JSON-encoded array of `PageAction` objects to execute before extraction.
1204
+ example: '[{"type":"click","selector":"#load-more"}]'
1205
+ - name: maxAge
1206
+ in: query
1207
+ schema:
1208
+ type: integer
1209
+ description: Maximum acceptable cache age in milliseconds. Default is 172800000 (2 days).
1210
+ - name: noCache
1211
+ in: query
1212
+ schema:
1213
+ type: boolean
1214
+ default: false
1215
+ description: Bypass cache and force a fresh fetch.
1216
+ - name: cacheTtl
1217
+ in: query
1218
+ schema:
1219
+ type: integer
1220
+ description: Cache TTL in seconds for this response. Default is 300 (5 minutes).
1221
+ - name: stream
1222
+ in: query
1223
+ schema:
1224
+ type: boolean
1225
+ default: false
1226
+ description: Prepare a streaming response.
1227
+ - name: budget
1228
+ in: query
1229
+ schema:
1230
+ type: integer
1231
+ description: |
1232
+ Smart token budget — distill content to N tokens using heuristic
1233
+ compression (no LLM required). Default: 4000. Set to 0 to disable.
1234
+ example: 4000
1235
+ - name: question
1236
+ in: query
1237
+ schema:
1238
+ type: string
1239
+ description: Ask a question about the content using BM25 relevance scoring (no LLM key needed).
1240
+ example: "What is the pricing?"
1241
+ - name: readable
1242
+ in: query
1243
+ schema:
1244
+ type: boolean
1245
+ default: false
1246
+ description: Reader mode — extract only the main article, strip all noise.
1247
+ - name: stealth
1248
+ in: query
1249
+ schema:
1250
+ type: boolean
1251
+ default: false
1252
+ description: Stealth mode to bypass bot detection on sites like Amazon, LinkedIn.
1253
+ - name: screenshot
1254
+ in: query
1255
+ schema:
1256
+ type: boolean
1257
+ default: false
1258
+ description: Also capture a screenshot (returned as base64 PNG).
1259
+ - name: maxTokens
1260
+ in: query
1261
+ schema:
1262
+ type: integer
1263
+ description: Maximum token count — hard truncation.
1264
+ - name: selector
1265
+ in: query
1266
+ schema:
1267
+ type: string
1268
+ description: CSS selector to extract specific content only.
1269
+ example: "article.post"
1270
+ - name: exclude
1271
+ in: query
1272
+ schema:
1273
+ type: string
1274
+ description: Comma-separated CSS selectors to exclude from output.
1275
+ example: ".ads,.sidebar"
1276
+ - name: fullPage
1277
+ in: query
1278
+ schema:
1279
+ type: boolean
1280
+ default: false
1281
+ description: Disable smart pruning and return the complete page content.
1282
+ - name: raw
1283
+ in: query
1284
+ schema:
1285
+ type: boolean
1286
+ default: false
1287
+ description: Skip content extraction — return raw HTML without stripping boilerplate.
1288
+ - name: lite
1289
+ in: query
1290
+ schema:
1291
+ type: boolean
1292
+ default: false
1293
+ description: Lite mode — minimal processing, maximum speed (~50% faster).
1294
+ - name: timeout
1295
+ in: query
1296
+ schema:
1297
+ type: integer
1298
+ description: Request timeout in milliseconds. Default is 30000.
1299
+ example: 30000
1300
+ - name: storeInCache
1301
+ in: query
1302
+ schema:
1303
+ type: boolean
1304
+ default: true
1305
+ description: Whether to cache the response. Set to `false` to skip caching.
1306
+ responses:
1307
+ "200":
1308
+ description: Successfully fetched the URL.
1309
+ headers:
1310
+ X-Cache:
1311
+ $ref: "#/components/headers/X-Cache"
1312
+ X-Cache-Age:
1313
+ $ref: "#/components/headers/X-Cache-Age"
1314
+ X-Credits-Used:
1315
+ $ref: "#/components/headers/X-Credits-Used"
1316
+ X-Processing-Time:
1317
+ $ref: "#/components/headers/X-Processing-Time"
1318
+ X-Request-Id:
1319
+ $ref: "#/components/headers/X-Request-Id"
1320
+ X-Fetch-Type:
1321
+ $ref: "#/components/headers/X-Fetch-Type"
1322
+ X-Auto-Budget:
1323
+ $ref: "#/components/headers/X-Auto-Budget"
1324
+ X-Degraded:
1325
+ $ref: "#/components/headers/X-Degraded"
1326
+ X-RateLimit-Limit:
1327
+ $ref: "#/components/headers/X-RateLimit-Limit"
1328
+ X-RateLimit-Remaining:
1329
+ $ref: "#/components/headers/X-RateLimit-Remaining"
1330
+ X-RateLimit-Reset:
1331
+ $ref: "#/components/headers/X-RateLimit-Reset"
1332
+ content:
1333
+ application/json:
1334
+ schema:
1335
+ $ref: "#/components/schemas/PeelResult"
1336
+ example:
1337
+ url: https://example.com
1338
+ title: "Example Domain"
1339
+ content: "# Example Domain\n\nThis domain is for use in illustrative examples in documents."
1340
+ metadata:
1341
+ description: "Example Domain"
1342
+ language: en
1343
+ links: ["https://www.iana.org/domains/example"]
1344
+ tokens: 42
1345
+ method: simple
1346
+ elapsed: 187
1347
+ quality: 0.91
1348
+ fingerprint: "a1b2c3d4e5f6a1b2"
1349
+ "400":
1350
+ description: Invalid request parameters.
1351
+ content:
1352
+ application/json:
1353
+ schema:
1354
+ $ref: "#/components/schemas/Error"
1355
+ examples:
1356
+ missing_url:
1357
+ summary: Missing URL parameter
1358
+ value:
1359
+ success: false
1360
+ error:
1361
+ type: invalid_request
1362
+ message: 'Missing or invalid "url" parameter. Pass a URL as a query parameter: GET /v1/fetch?url=https://example.com'
1363
+ hint: 'curl "https://api.webpeel.dev/v1/fetch?url=https://example.com"'
1364
+ docs: https://webpeel.dev/docs/api-reference#fetch
1365
+ metadata:
1366
+ requestId: "req_abc123"
1367
+ invalid_url:
1368
+ summary: Malformed URL
1369
+ value:
1370
+ success: false
1371
+ error:
1372
+ type: invalid_url
1373
+ message: "Invalid URL format"
1374
+ metadata:
1375
+ requestId: "req_abc123"
1376
+ "401":
1377
+ $ref: "#/components/responses/Unauthorized"
1378
+ "403":
1379
+ description: Target site is blocking automated requests.
1380
+ content:
1381
+ application/json:
1382
+ schema:
1383
+ $ref: "#/components/schemas/Error"
1384
+ example:
1385
+ success: false
1386
+ error:
1387
+ type: BLOCKED
1388
+ message: "This site actively blocks automated requests."
1389
+ hint: "Try adding render=true or stealth=true."
1390
+ docs: https://webpeel.dev/docs/api-reference#errors
1391
+ metadata:
1392
+ requestId: "req_abc123"
1393
+ "429":
1394
+ $ref: "#/components/responses/RateLimited"
1395
+ "502":
1396
+ description: Network error — could not reach the target URL.
1397
+ content:
1398
+ application/json:
1399
+ schema:
1400
+ $ref: "#/components/schemas/Error"
1401
+ example:
1402
+ success: false
1403
+ error:
1404
+ type: NETWORK
1405
+ message: "Could not reach https://example.com — connection refused."
1406
+ metadata:
1407
+ requestId: "req_abc123"
1408
+ "504":
1409
+ description: Request timed out.
1410
+ content:
1411
+ application/json:
1412
+ schema:
1413
+ $ref: "#/components/schemas/Error"
1414
+ example:
1415
+ success: false
1416
+ error:
1417
+ type: TIMEOUT
1418
+ message: "Request timed out after 30000ms"
1419
+ hint: "Try increasing timeout with ?timeout=60000, or use render=true for JS-heavy sites."
1420
+ docs: https://webpeel.dev/docs/api-reference#errors
1421
+ metadata:
1422
+ requestId: "req_abc123"
1423
+ "500":
1424
+ $ref: "#/components/responses/InternalError"
1425
+
1426
+ post:
1427
+ operationId: fetchUrlPost
1428
+ summary: Fetch a URL (POST)
1429
+ description: |
1430
+ POST variant of `/v1/fetch`. Accepts a JSON body with all the same options as the GET
1431
+ endpoint, plus **inline LLM extraction** (BYOK) via the `extract` field.
1432
+
1433
+ Use POST when you need to:
1434
+ - Pass complex `actions` arrays
1435
+ - Use inline LLM extraction (JSON Schema or prompt)
1436
+ - Pass complex CSS selector/tag arrays
1437
+
1438
+ **Quick start:**
1439
+ ```bash
1440
+ curl -X POST https://api.webpeel.dev/v1/fetch \
1441
+ -H "Authorization: Bearer wp_YOUR_KEY" \
1442
+ -H "Content-Type: application/json" \
1443
+ -d '{"url": "https://example.com"}'
1444
+ ```
1445
+
1446
+ **With inline extraction:**
1447
+ ```bash
1448
+ curl -X POST https://api.webpeel.dev/v1/fetch \
1449
+ -H "Authorization: Bearer wp_YOUR_KEY" \
1450
+ -H "Content-Type: application/json" \
1451
+ -d '{
1452
+ "url": "https://shop.example.com/product",
1453
+ "extract": {
1454
+ "schema": {
1455
+ "type": "object",
1456
+ "properties": {
1457
+ "name": {"type": "string"},
1458
+ "price": {"type": "number"}
1459
+ }
1460
+ }
1461
+ },
1462
+ "llmProvider": "openai",
1463
+ "llmApiKey": "sk-..."
1464
+ }'
1465
+ ```
1466
+ tags: [Fetch]
1467
+ requestBody:
1468
+ required: true
1469
+ content:
1470
+ application/json:
1471
+ schema:
1472
+ $ref: "#/components/schemas/FetchPostBody"
1473
+ examples:
1474
+ basic:
1475
+ summary: Basic fetch
1476
+ value:
1477
+ url: https://example.com
1478
+ with_render:
1479
+ summary: Browser rendering for SPA
1480
+ value:
1481
+ url: https://spa-example.com
1482
+ render: true
1483
+ wait: 2000
1484
+ format: markdown
1485
+ with_actions:
1486
+ summary: Click login form and extract
1487
+ value:
1488
+ url: https://example.com/login
1489
+ render: true
1490
+ actions:
1491
+ - type: fill
1492
+ selector: "#email"
1493
+ value: user@example.com
1494
+ - type: fill
1495
+ selector: "#password"
1496
+ value: "secret"
1497
+ - type: click
1498
+ selector: "[type=submit]"
1499
+ budget: 4000
1500
+ with_extraction:
1501
+ summary: Inline LLM extraction (BYOK)
1502
+ value:
1503
+ url: https://shop.example.com/product/123
1504
+ extract:
1505
+ schema:
1506
+ type: object
1507
+ properties:
1508
+ name:
1509
+ type: string
1510
+ price:
1511
+ type: number
1512
+ inStock:
1513
+ type: boolean
1514
+ llmProvider: openai
1515
+ llmApiKey: "sk-..."
1516
+ llmModel: "gpt-4o-mini"
1517
+ responses:
1518
+ "200":
1519
+ description: Successfully fetched the URL.
1520
+ headers:
1521
+ X-Cache:
1522
+ $ref: "#/components/headers/X-Cache"
1523
+ X-Credits-Used:
1524
+ $ref: "#/components/headers/X-Credits-Used"
1525
+ X-Processing-Time:
1526
+ $ref: "#/components/headers/X-Processing-Time"
1527
+ X-Request-Id:
1528
+ $ref: "#/components/headers/X-Request-Id"
1529
+ X-Fetch-Type:
1530
+ $ref: "#/components/headers/X-Fetch-Type"
1531
+ X-Auto-Budget:
1532
+ $ref: "#/components/headers/X-Auto-Budget"
1533
+ X-RateLimit-Limit:
1534
+ $ref: "#/components/headers/X-RateLimit-Limit"
1535
+ X-RateLimit-Remaining:
1536
+ $ref: "#/components/headers/X-RateLimit-Remaining"
1537
+ X-RateLimit-Reset:
1538
+ $ref: "#/components/headers/X-RateLimit-Reset"
1539
+ content:
1540
+ application/json:
1541
+ schema:
1542
+ $ref: "#/components/schemas/PeelResult"
1543
+ examples:
1544
+ basic_response:
1545
+ summary: Basic fetch response
1546
+ value:
1547
+ url: https://example.com
1548
+ title: "Example Domain"
1549
+ content: "# Example Domain\n\nThis domain is for use in illustrative examples."
1550
+ metadata:
1551
+ description: "Example Domain"
1552
+ links: ["https://www.iana.org/domains/example"]
1553
+ tokens: 42
1554
+ method: simple
1555
+ elapsed: 187
1556
+ extraction_response:
1557
+ summary: Response with LLM extraction
1558
+ value:
1559
+ url: https://shop.example.com/product/123
1560
+ title: "Acme Widget Pro"
1561
+ content: "# Acme Widget Pro\n\nPrice: $49.99..."
1562
+ metadata: {}
1563
+ links: []
1564
+ tokens: 312
1565
+ method: simple
1566
+ elapsed: 850
1567
+ json:
1568
+ name: "Acme Widget Pro"
1569
+ price: 49.99
1570
+ inStock: true
1571
+ "400":
1572
+ description: Invalid request parameters.
1573
+ content:
1574
+ application/json:
1575
+ schema:
1576
+ $ref: "#/components/schemas/Error"
1577
+ "401":
1578
+ $ref: "#/components/responses/Unauthorized"
1579
+ "429":
1580
+ $ref: "#/components/responses/RateLimited"
1581
+ "500":
1582
+ $ref: "#/components/responses/InternalError"
1583
+
1584
+ /v2/scrape:
1585
+ post:
1586
+ operationId: scrapeUrlV2
1587
+ summary: Scrape a URL (v2, Firecrawl-compatible)
1588
+ description: |
1589
+ Alias for `POST /v1/fetch` with identical behaviour. Provided for
1590
+ **Firecrawl API compatibility** — existing Firecrawl integrations work
1591
+ without modification.
1592
+
1593
+ Accepts the same request body as `POST /v1/fetch`, including the
1594
+ `formats` array for Firecrawl-style JSON extraction.
1595
+ tags: [Fetch]
1596
+ requestBody:
1597
+ required: true
1598
+ content:
1599
+ application/json:
1600
+ schema:
1601
+ $ref: "#/components/schemas/FetchPostBody"
1602
+ examples:
1603
+ firecrawl_compat:
1604
+ summary: Firecrawl-compatible request
1605
+ value:
1606
+ url: https://example.com
1607
+ formats:
1608
+ - type: json
1609
+ schema:
1610
+ type: object
1611
+ properties:
1612
+ title:
1613
+ type: string
1614
+ description:
1615
+ type: string
1616
+ llmProvider: openai
1617
+ llmApiKey: "sk-..."
1618
+ responses:
1619
+ "200":
1620
+ description: Successfully scraped the URL.
1621
+ content:
1622
+ application/json:
1623
+ schema:
1624
+ $ref: "#/components/schemas/PeelResult"
1625
+ "400":
1626
+ description: Invalid request parameters.
1627
+ content:
1628
+ application/json:
1629
+ schema:
1630
+ $ref: "#/components/schemas/Error"
1631
+ "401":
1632
+ $ref: "#/components/responses/Unauthorized"
1633
+ "429":
1634
+ $ref: "#/components/responses/RateLimited"
1635
+ "500":
1636
+ $ref: "#/components/responses/InternalError"
1637
+
1638
+ # ===========================================================================
1639
+ # Search
1640
+ # ===========================================================================
1641
+
1642
+ /v1/search:
1643
+ get:
1644
+ operationId: searchWeb
1645
+ summary: Search the web
1646
+ description: |
1647
+ Search the web and return structured results with titles, URLs, and snippets.
1648
+
1649
+ **Quick start:**
1650
+ ```bash
1651
+ curl "https://api.webpeel.dev/v1/search?q=latest+AI+news&count=5" \
1652
+ -H "Authorization: Bearer wp_YOUR_KEY"
1653
+ ```
1654
+
1655
+ **Search providers:**
1656
+ - `auto` (default) — picks the best available provider automatically
1657
+ - `duckduckgo` — free, no API key needed
1658
+ - `brave` — higher quality results; requires `searchApiKey` (BYOK)
1659
+
1660
+ **Data sources** (comma-separated via `sources`):
1661
+ - `web` (default) — standard web results
1662
+ - `news` — news articles
1663
+ - `images` — image results
1664
+
1665
+ Results are cached for 15 minutes.
1666
+ tags: [Search]
1667
+ parameters:
1668
+ - name: q
1669
+ in: query
1670
+ required: true
1671
+ description: Search query.
1672
+ schema:
1673
+ type: string
1674
+ example: "latest AI news"
1675
+ - name: count
1676
+ in: query
1677
+ schema:
1678
+ type: integer
1679
+ minimum: 1
1680
+ maximum: 10
1681
+ default: 5
1682
+ description: Number of results to return (1–10).
1683
+ example: 5
1684
+ - name: sources
1685
+ in: query
1686
+ schema:
1687
+ type: string
1688
+ default: web
1689
+ description: Comma-separated data sources — `web`, `news`, `images`.
1690
+ example: "web,news"
1691
+ - name: provider
1692
+ in: query
1693
+ schema:
1694
+ type: string
1695
+ enum: [auto, duckduckgo, brave, stealth]
1696
+ default: auto
1697
+ description: Search provider. Use `brave` with `searchApiKey` for higher quality.
1698
+ - name: searchApiKey
1699
+ in: query
1700
+ schema:
1701
+ type: string
1702
+ description: Brave Search API key (BYOK). Required when `provider=brave`.
1703
+ - name: scrapeResults
1704
+ in: query
1705
+ schema:
1706
+ type: boolean
1707
+ default: false
1708
+ description: |
1709
+ Fetch full content for each result URL and include in `content` field.
1710
+ Significantly increases response time and credits used.
1711
+ - name: categories
1712
+ in: query
1713
+ schema:
1714
+ type: string
1715
+ description: |
1716
+ Comma-separated category filters: `github`, `pdf`, `docs`, `blog`, `news`,
1717
+ `video`, `social`. Filters results by URL pattern.
1718
+ example: "docs,github"
1719
+ - name: tbs
1720
+ in: query
1721
+ schema:
1722
+ type: string
1723
+ description: Time-based search filter (e.g., `qdr:d` for last 24h, `qdr:w` for last week).
1724
+ example: "qdr:d"
1725
+ - name: country
1726
+ in: query
1727
+ schema:
1728
+ type: string
1729
+ description: ISO 3166-1 alpha-2 country code for geo-specific results.
1730
+ example: US
1731
+ - name: location
1732
+ in: query
1733
+ schema:
1734
+ type: string
1735
+ description: Location string for localized results.
1736
+ example: "New York, NY"
1737
+ responses:
1738
+ "200":
1739
+ description: Search results.
1740
+ headers:
1741
+ X-Cache:
1742
+ $ref: "#/components/headers/X-Cache"
1743
+ X-Cache-Age:
1744
+ $ref: "#/components/headers/X-Cache-Age"
1745
+ X-Credits-Used:
1746
+ $ref: "#/components/headers/X-Credits-Used"
1747
+ X-Processing-Time:
1748
+ $ref: "#/components/headers/X-Processing-Time"
1749
+ X-Request-Id:
1750
+ $ref: "#/components/headers/X-Request-Id"
1751
+ X-RateLimit-Limit:
1752
+ $ref: "#/components/headers/X-RateLimit-Limit"
1753
+ X-RateLimit-Remaining:
1754
+ $ref: "#/components/headers/X-RateLimit-Remaining"
1755
+ X-RateLimit-Reset:
1756
+ $ref: "#/components/headers/X-RateLimit-Reset"
1757
+ content:
1758
+ application/json:
1759
+ schema:
1760
+ type: object
1761
+ required: [success, data]
1762
+ properties:
1763
+ success:
1764
+ type: boolean
1765
+ const: true
1766
+ data:
1767
+ type: object
1768
+ properties:
1769
+ web:
1770
+ type: array
1771
+ items:
1772
+ $ref: "#/components/schemas/SearchResult"
1773
+ description: Web search results. Present when `sources` includes `web`.
1774
+ news:
1775
+ type: array
1776
+ items:
1777
+ $ref: "#/components/schemas/SearchNewsResult"
1778
+ description: News results. Present when `sources` includes `news`.
1779
+ images:
1780
+ type: array
1781
+ items:
1782
+ $ref: "#/components/schemas/SearchImageResult"
1783
+ description: Image results. Present when `sources` includes `images`.
1784
+ example:
1785
+ success: true
1786
+ data:
1787
+ web:
1788
+ - title: "GPT-5 Released: What You Need to Know"
1789
+ url: https://techcrunch.com/2025/02/gpt5
1790
+ snippet: "OpenAI has released GPT-5, a major leap forward in AI capabilities..."
1791
+ - title: "Google Gemini 2.0 Ultra Benchmarks"
1792
+ url: https://blog.google/gemini-2-ultra
1793
+ snippet: "Gemini 2.0 Ultra achieves state-of-the-art results across all major benchmarks..."
1794
+ "400":
1795
+ description: Invalid request parameters.
1796
+ content:
1797
+ application/json:
1798
+ schema:
1799
+ $ref: "#/components/schemas/Error"
1800
+ example:
1801
+ success: false
1802
+ error:
1803
+ type: invalid_request
1804
+ message: 'Missing or invalid "q" parameter. Pass a search query: GET /v1/search?q=your+search+terms'
1805
+ hint: 'curl "https://api.webpeel.dev/v1/search?q=latest+AI+news&count=5"'
1806
+ docs: https://webpeel.dev/docs/api-reference#search
1807
+ metadata:
1808
+ requestId: "req_abc123"
1809
+ "401":
1810
+ $ref: "#/components/responses/Unauthorized"
1811
+ "429":
1812
+ $ref: "#/components/responses/RateLimited"
1813
+ "500":
1814
+ description: Search request failed.
1815
+ content:
1816
+ application/json:
1817
+ schema:
1818
+ $ref: "#/components/schemas/Error"
1819
+ example:
1820
+ success: false
1821
+ error:
1822
+ type: internal_error
1823
+ message: "Search request failed. If using Brave provider, verify your API key. Otherwise try again."
1824
+ hint: "Free search uses DuckDuckGo (no key required). For higher quality, add provider=brave&searchApiKey=YOUR_KEY"
1825
+ docs: https://webpeel.dev/docs/api-reference#search
1826
+ metadata:
1827
+ requestId: "req_abc123"
1828
+
1829
+ # ===========================================================================
1830
+ # Crawl
1831
+ # ===========================================================================
1832
+
1833
+ /v1/crawl:
1834
+ post:
1835
+ operationId: crawlSite
1836
+ summary: Crawl a website
1837
+ description: |
1838
+ Start an asynchronous crawl of a website. The crawl follows internal links
1839
+ up to the configured `maxDepth` and `limit`, then returns a job ID.
1840
+
1841
+ **Workflow:**
1842
+ 1. `POST /v1/crawl` → returns `{ success: true, id: "job_..." }`
1843
+ 2. Poll `GET /v1/crawl/{id}` until `status === "completed"`
1844
+ 3. Results are in the `data` array
1845
+
1846
+ **Quick start:**
1847
+ ```bash
1848
+ curl -X POST https://api.webpeel.dev/v1/crawl \
1849
+ -H "Authorization: Bearer wp_YOUR_KEY" \
1850
+ -H "Content-Type: application/json" \
1851
+ -d '{"url": "https://docs.example.com", "limit": 50, "maxDepth": 3}'
1852
+ ```
1853
+ tags: [Crawl]
1854
+ requestBody:
1855
+ required: true
1856
+ content:
1857
+ application/json:
1858
+ schema:
1859
+ type: object
1860
+ required: [url]
1861
+ properties:
1862
+ url:
1863
+ type: string
1864
+ format: uri
1865
+ description: The start URL to crawl. Must be HTTP or HTTPS.
1866
+ example: https://docs.example.com
1867
+ limit:
1868
+ type: integer
1869
+ minimum: 1
1870
+ maximum: 10000
1871
+ default: 100
1872
+ description: Maximum number of pages to crawl.
1873
+ example: 50
1874
+ maxDepth:
1875
+ type: integer
1876
+ minimum: 1
1877
+ maximum: 10
1878
+ default: 3
1879
+ description: Maximum link depth from the start URL.
1880
+ example: 3
1881
+ includePaths:
1882
+ type: array
1883
+ items:
1884
+ type: string
1885
+ description: |
1886
+ URL path patterns to include (glob or prefix match).
1887
+ Only URLs matching at least one pattern will be crawled.
1888
+ example: ["/docs/", "/blog/"]
1889
+ excludePaths:
1890
+ type: array
1891
+ items:
1892
+ type: string
1893
+ description: URL path patterns to exclude from crawling.
1894
+ example: ["/login", "/checkout", "/cart"]
1895
+ scrapeOptions:
1896
+ type: object
1897
+ description: Options to apply when fetching each page (same as POST /v1/fetch body).
1898
+ properties:
1899
+ format:
1900
+ type: string
1901
+ enum: [markdown, text, html]
1902
+ default: markdown
1903
+ budget:
1904
+ type: integer
1905
+ description: Smart token budget per page.
1906
+ example: 4000
1907
+ maxTokens:
1908
+ type: integer
1909
+ description: Hard token limit per page.
1910
+ webhook:
1911
+ type: string
1912
+ format: uri
1913
+ description: Webhook URL to notify when the crawl completes.
1914
+ example: https://your-server.com/webhook
1915
+ examples:
1916
+ basic_crawl:
1917
+ summary: Crawl a documentation site
1918
+ value:
1919
+ url: https://docs.example.com
1920
+ limit: 50
1921
+ maxDepth: 3
1922
+ includePaths: ["/docs/"]
1923
+ with_webhook:
1924
+ summary: Crawl with webhook notification
1925
+ value:
1926
+ url: https://docs.example.com
1927
+ limit: 100
1928
+ maxDepth: 2
1929
+ webhook: https://your-server.com/webhook/crawl-done
1930
+ responses:
1931
+ "200":
1932
+ description: Crawl job started.
1933
+ content:
1934
+ application/json:
1935
+ schema:
1936
+ type: object
1937
+ required: [success, id]
1938
+ properties:
1939
+ success:
1940
+ type: boolean
1941
+ const: true
1942
+ id:
1943
+ type: string
1944
+ description: Job ID. Use with `GET /v1/crawl/{id}` to check status.
1945
+ example: "job_a1b2c3d4"
1946
+ example:
1947
+ success: true
1948
+ id: "job_a1b2c3d4"
1949
+ "400":
1950
+ description: Invalid request parameters.
1951
+ content:
1952
+ application/json:
1953
+ schema:
1954
+ $ref: "#/components/schemas/Error"
1955
+ example:
1956
+ success: false
1957
+ error:
1958
+ type: invalid_request
1959
+ message: 'Missing or invalid "url" parameter'
1960
+ metadata:
1961
+ requestId: "req_abc123"
1962
+ "401":
1963
+ $ref: "#/components/responses/Unauthorized"
1964
+ "429":
1965
+ $ref: "#/components/responses/RateLimited"
1966
+ "500":
1967
+ $ref: "#/components/responses/InternalError"
1968
+
1969
+ /v1/crawl/{id}:
1970
+ get:
1971
+ operationId: getCrawlStatus
1972
+ summary: Get crawl job status
1973
+ description: |
1974
+ Poll the status and results of a crawl job started with `POST /v1/crawl`.
1975
+
1976
+ **Quick start:**
1977
+ ```bash
1978
+ curl "https://api.webpeel.dev/v1/crawl/job_a1b2c3d4" \
1979
+ -H "Authorization: Bearer wp_YOUR_KEY"
1980
+ ```
1981
+
1982
+ Poll until `status === "completed"` or `status === "failed"`.
1983
+ tags: [Crawl]
1984
+ parameters:
1985
+ - name: id
1986
+ in: path
1987
+ required: true
1988
+ description: Crawl job ID returned by `POST /v1/crawl`.
1989
+ schema:
1990
+ type: string
1991
+ example: "job_a1b2c3d4"
1992
+ responses:
1993
+ "200":
1994
+ description: Crawl job status and (when complete) results.
1995
+ content:
1996
+ application/json:
1997
+ schema:
1998
+ type: object
1999
+ required: [success, status, completed, total]
2000
+ properties:
2001
+ success:
2002
+ type: boolean
2003
+ const: true
2004
+ status:
2005
+ type: string
2006
+ enum: [queued, scraping, completed, failed, cancelled]
2007
+ description: "`scraping` is the in-progress state (matches Firecrawl format)."
2008
+ example: completed
2009
+ completed:
2010
+ type: integer
2011
+ description: Number of pages scraped so far.
2012
+ example: 47
2013
+ total:
2014
+ type: integer
2015
+ description: Total pages discovered.
2016
+ example: 50
2017
+ creditsUsed:
2018
+ type: integer
2019
+ description: Credits consumed so far.
2020
+ example: 47
2021
+ expiresAt:
2022
+ type: string
2023
+ format: date-time
2024
+ description: When job data expires.
2025
+ data:
2026
+ type: array
2027
+ description: Scraped page results. Present when `status === "completed"`.
2028
+ items:
2029
+ type: object
2030
+ properties:
2031
+ url:
2032
+ type: string
2033
+ format: uri
2034
+ markdown:
2035
+ type: string
2036
+ metadata:
2037
+ type: object
2038
+ properties:
2039
+ title:
2040
+ type: string
2041
+ description:
2042
+ type: string
2043
+ sourceURL:
2044
+ type: string
2045
+ format: uri
2046
+ links:
2047
+ type: array
2048
+ items:
2049
+ type: string
2050
+ format: uri
2051
+ example:
2052
+ success: true
2053
+ status: completed
2054
+ completed: 47
2055
+ total: 50
2056
+ creditsUsed: 47
2057
+ expiresAt: "2025-02-26T08:00:00.000Z"
2058
+ data:
2059
+ - url: https://docs.example.com/getting-started
2060
+ markdown: "# Getting Started\n\nWelcome to the docs..."
2061
+ metadata:
2062
+ title: "Getting Started"
2063
+ sourceURL: https://docs.example.com/getting-started
2064
+ links: ["https://docs.example.com/installation"]
2065
+ "401":
2066
+ $ref: "#/components/responses/Unauthorized"
2067
+ "404":
2068
+ description: Job not found.
2069
+ content:
2070
+ application/json:
2071
+ schema:
2072
+ $ref: "#/components/schemas/Error"
2073
+ "500":
2074
+ $ref: "#/components/responses/InternalError"
2075
+
2076
+ # ===========================================================================
2077
+ # Map
2078
+ # ===========================================================================
2079
+
2080
+ /v1/map:
2081
+ post:
2082
+ operationId: mapDomain
2083
+ summary: Map all URLs on a domain
2084
+ description: |
2085
+ Discover all accessible URLs on a domain by parsing its sitemap and
2086
+ following internal links. Returns a flat list of URLs.
2087
+
2088
+ Useful for building a URL inventory before running a batch fetch or crawl.
2089
+
2090
+ **Quick start:**
2091
+ ```bash
2092
+ curl -X POST https://api.webpeel.dev/v1/map \
2093
+ -H "Authorization: Bearer wp_YOUR_KEY" \
2094
+ -H "Content-Type: application/json" \
2095
+ -d '{"url": "https://example.com"}'
2096
+ ```
2097
+ tags: [Map]
2098
+ requestBody:
2099
+ required: true
2100
+ content:
2101
+ application/json:
2102
+ schema:
2103
+ type: object
2104
+ required: [url]
2105
+ properties:
2106
+ url:
2107
+ type: string
2108
+ format: uri
2109
+ description: The domain or URL to map. Must be HTTP or HTTPS.
2110
+ example: https://example.com
2111
+ limit:
2112
+ type: integer
2113
+ minimum: 1
2114
+ maximum: 5000
2115
+ default: 5000
2116
+ description: Maximum number of URLs to return.
2117
+ example: 1000
2118
+ search:
2119
+ type: string
2120
+ description: Optional keyword to filter results — only URLs containing this string are returned.
2121
+ example: "/blog/"
2122
+ examples:
2123
+ basic_map:
2124
+ summary: Map all URLs on a domain
2125
+ value:
2126
+ url: https://example.com
2127
+ limit: 500
2128
+ filtered_map:
2129
+ summary: Map only blog URLs
2130
+ value:
2131
+ url: https://example.com
2132
+ search: "/blog/"
2133
+ responses:
2134
+ "200":
2135
+ description: List of URLs discovered on the domain.
2136
+ content:
2137
+ application/json:
2138
+ schema:
2139
+ type: object
2140
+ required: [success, links]
2141
+ properties:
2142
+ success:
2143
+ type: boolean
2144
+ const: true
2145
+ links:
2146
+ type: array
2147
+ items:
2148
+ type: string
2149
+ format: uri
2150
+ description: Discovered URLs, deduplicated and normalised.
2151
+ example:
2152
+ success: true
2153
+ links:
2154
+ - https://example.com/
2155
+ - https://example.com/about
2156
+ - https://example.com/blog
2157
+ - https://example.com/blog/post-1
2158
+ - https://example.com/pricing
2159
+ "400":
2160
+ description: Invalid request parameters.
2161
+ content:
2162
+ application/json:
2163
+ schema:
2164
+ $ref: "#/components/schemas/Error"
2165
+ "401":
2166
+ $ref: "#/components/responses/Unauthorized"
2167
+ "429":
2168
+ $ref: "#/components/responses/RateLimited"
2169
+ "500":
2170
+ $ref: "#/components/responses/InternalError"
2171
+
2172
+ # ===========================================================================
2173
+ # Extract
2174
+ # ===========================================================================
2175
+
2176
+ /v1/extract:
2177
+ post:
2178
+ operationId: extractStructuredData
2179
+ summary: Extract structured data from a URL
2180
+ description: |
2181
+ Fetch a URL and extract structured data using an LLM (BYOK) guided by a
2182
+ JSON Schema and/or a natural language prompt.
2183
+
2184
+ This is a Firecrawl-compatible endpoint for structured data extraction.
2185
+ You must supply your own LLM API key.
2186
+
2187
+ **Quick start:**
2188
+ ```bash
2189
+ curl -X POST https://api.webpeel.dev/v1/extract \
2190
+ -H "Authorization: Bearer wp_YOUR_KEY" \
2191
+ -H "Content-Type: application/json" \
2192
+ -d '{
2193
+ "url": "https://shop.example.com/product/123",
2194
+ "schema": {
2195
+ "type": "object",
2196
+ "properties": {
2197
+ "name": {"type": "string"},
2198
+ "price": {"type": "number"},
2199
+ "inStock": {"type": "boolean"}
2200
+ }
2201
+ },
2202
+ "llmApiKey": "sk-..."
2203
+ }'
2204
+ ```
2205
+ tags: [Extract]
2206
+ requestBody:
2207
+ required: true
2208
+ content:
2209
+ application/json:
2210
+ schema:
2211
+ type: object
2212
+ required: [url]
2213
+ properties:
2214
+ url:
2215
+ type: string
2216
+ format: uri
2217
+ description: The URL to fetch and extract from. Must be HTTP or HTTPS. Max 2048 characters.
2218
+ example: https://shop.example.com/product/123
2219
+ schema:
2220
+ type: object
2221
+ additionalProperties: true
2222
+ description: |
2223
+ JSON Schema defining the desired output structure.
2224
+ At least one of `schema` or `prompt` is required.
2225
+ example:
2226
+ type: object
2227
+ properties:
2228
+ name:
2229
+ type: string
2230
+ price:
2231
+ type: number
2232
+ inStock:
2233
+ type: boolean
2234
+ prompt:
2235
+ type: string
2236
+ description: |
2237
+ Natural language extraction instruction.
2238
+ At least one of `schema` or `prompt` is required.
2239
+ example: "Extract the product name, price in USD, and whether it is in stock."
2240
+ llmApiKey:
2241
+ type: string
2242
+ description: |
2243
+ Your LLM API key (BYOK). If not provided, uses the server's `OPENAI_API_KEY`
2244
+ environment variable (if configured).
2245
+ example: "sk-..."
2246
+ model:
2247
+ type: string
2248
+ description: LLM model to use. Defaults to `gpt-4o-mini`.
2249
+ example: "gpt-4o-mini"
2250
+ baseUrl:
2251
+ type: string
2252
+ format: uri
2253
+ description: |
2254
+ Custom LLM base URL (for OpenAI-compatible endpoints).
2255
+ Defaults to `https://api.openai.com/v1`.
2256
+ example: "https://api.openai.com/v1"
2257
+ examples:
2258
+ product_extraction:
2259
+ summary: Extract product details
2260
+ value:
2261
+ url: https://shop.example.com/product/123
2262
+ schema:
2263
+ type: object
2264
+ properties:
2265
+ name:
2266
+ type: string
2267
+ price:
2268
+ type: number
2269
+ inStock:
2270
+ type: boolean
2271
+ images:
2272
+ type: array
2273
+ items:
2274
+ type: string
2275
+ llmApiKey: "sk-..."
2276
+ prompt_only:
2277
+ summary: Prompt-guided extraction
2278
+ value:
2279
+ url: https://example.com/contact
2280
+ prompt: "Extract all email addresses, phone numbers, and physical addresses."
2281
+ llmApiKey: "sk-..."
2282
+ responses:
2283
+ "200":
2284
+ description: Structured data extracted successfully.
2285
+ content:
2286
+ application/json:
2287
+ schema:
2288
+ type: object
2289
+ required: [success, data, metadata]
2290
+ properties:
2291
+ success:
2292
+ type: boolean
2293
+ const: true
2294
+ data:
2295
+ description: Extracted data conforming to the provided schema. May be an object or array.
2296
+ oneOf:
2297
+ - type: object
2298
+ additionalProperties: true
2299
+ - type: array
2300
+ items:
2301
+ type: object
2302
+ additionalProperties: true
2303
+ metadata:
2304
+ type: object
2305
+ properties:
2306
+ url:
2307
+ type: string
2308
+ format: uri
2309
+ title:
2310
+ type: string
2311
+ tokensUsed:
2312
+ type: integer
2313
+ description: Total LLM tokens consumed.
2314
+ model:
2315
+ type: string
2316
+ description: LLM model used.
2317
+ cost:
2318
+ type: number
2319
+ description: Approximate LLM cost in USD.
2320
+ elapsed:
2321
+ type: integer
2322
+ description: Total processing time in milliseconds.
2323
+ example:
2324
+ success: true
2325
+ data:
2326
+ name: "Acme Widget Pro"
2327
+ price: 49.99
2328
+ inStock: true
2329
+ metadata:
2330
+ url: https://shop.example.com/product/123
2331
+ title: "Acme Widget Pro – Shop"
2332
+ tokensUsed: 620
2333
+ model: "gpt-4o-mini"
2334
+ cost: 0.0003
2335
+ elapsed: 1250
2336
+ "400":
2337
+ description: Invalid request parameters.
2338
+ content:
2339
+ application/json:
2340
+ schema:
2341
+ $ref: "#/components/schemas/Error"
2342
+ examples:
2343
+ missing_url:
2344
+ value:
2345
+ success: false
2346
+ error:
2347
+ type: invalid_request
2348
+ message: 'Missing or invalid "url" field in request body.'
2349
+ metadata:
2350
+ requestId: "req_abc123"
2351
+ missing_schema_and_prompt:
2352
+ value:
2353
+ success: false
2354
+ error:
2355
+ type: invalid_request
2356
+ message: 'Either "schema" or "prompt" is required for structured extraction.'
2357
+ metadata:
2358
+ requestId: "req_abc123"
2359
+ "401":
2360
+ $ref: "#/components/responses/Unauthorized"
2361
+ "429":
2362
+ $ref: "#/components/responses/RateLimited"
2363
+ "500":
2364
+ $ref: "#/components/responses/InternalError"
2365
+
2366
+ /v1/extract/auto:
2367
+ get:
2368
+ operationId: autoExtract
2369
+ summary: Auto-detect and extract structured data
2370
+ description: |
2371
+ Fetch a URL and automatically detect the page type (article, product, recipe,
2372
+ job listing, event, profile, etc.) then extract structured data without
2373
+ requiring an LLM API key.
2374
+
2375
+ Uses heuristic extraction — no LLM required, always free to use.
2376
+
2377
+ **Quick start:**
2378
+ ```bash
2379
+ curl "https://api.webpeel.dev/v1/extract/auto?url=https://shop.example.com/product/123" \
2380
+ -H "Authorization: Bearer wp_YOUR_KEY"
2381
+ ```
2382
+ tags: [Extract]
2383
+ parameters:
2384
+ - name: url
2385
+ in: query
2386
+ required: true
2387
+ description: The URL to fetch and auto-extract from.
2388
+ schema:
2389
+ type: string
2390
+ format: uri
2391
+ example: https://shop.example.com/product/123
2392
+ responses:
2393
+ "200":
2394
+ description: Auto-extracted structured data.
2395
+ content:
2396
+ application/json:
2397
+ schema:
2398
+ type: object
2399
+ required: [url, pageType, structured]
2400
+ properties:
2401
+ url:
2402
+ type: string
2403
+ format: uri
2404
+ description: The URL that was fetched.
2405
+ pageType:
2406
+ type: string
2407
+ description: Detected page type.
2408
+ enum: [article, product, recipe, job, event, profile, generic]
2409
+ example: product
2410
+ structured:
2411
+ type: object
2412
+ additionalProperties: true
2413
+ description: Extracted structured data specific to the detected page type.
2414
+ examples:
2415
+ product_page:
2416
+ summary: Product page extraction
2417
+ value:
2418
+ url: https://shop.example.com/product/123
2419
+ pageType: product
2420
+ structured:
2421
+ type: product
2422
+ title: "Acme Widget Pro"
2423
+ price: "$49.99"
2424
+ description: "The best widget for all your needs."
2425
+ images: ["https://shop.example.com/img/widget.jpg"]
2426
+ article_page:
2427
+ summary: Article page extraction
2428
+ value:
2429
+ url: https://blog.example.com/post/ai-news
2430
+ pageType: article
2431
+ structured:
2432
+ type: article
2433
+ title: "AI News Roundup"
2434
+ author: "Jane Doe"
2435
+ publishedAt: "2025-02-25T08:00:00.000Z"
2436
+ wordCount: 1200
2437
+ "400":
2438
+ description: Missing or invalid URL.
2439
+ content:
2440
+ application/json:
2441
+ schema:
2442
+ $ref: "#/components/schemas/Error"
2443
+ "401":
2444
+ $ref: "#/components/responses/Unauthorized"
2445
+ "429":
2446
+ $ref: "#/components/responses/RateLimited"
2447
+ "500":
2448
+ $ref: "#/components/responses/InternalError"
2449
+
2450
+ # ===========================================================================
2451
+ # Answer
2452
+ # ===========================================================================
2453
+
2454
+ /v1/answer/quick:
2455
+ get:
2456
+ operationId: quickAnswer
2457
+ summary: Quick Q&A on a URL (LLM-free)
2458
+ description: |
2459
+ Fetch a URL and answer a question about its content using BM25 relevance scoring.
2460
+ No LLM API key required — purely heuristic, always fast.
2461
+
2462
+ Ideal for extracting specific facts from pages (pricing, contact info, specs, etc.)
2463
+ without incurring LLM costs.
2464
+
2465
+ **Quick start:**
2466
+ ```bash
2467
+ curl "https://api.webpeel.dev/v1/answer/quick?url=https://example.com/pricing&question=What+is+the+pro+plan+price%3F" \
2468
+ -H "Authorization: Bearer wp_YOUR_KEY"
2469
+ ```
2470
+ tags: [Answer]
2471
+ parameters:
2472
+ - name: url
2473
+ in: query
2474
+ required: true
2475
+ description: The URL to fetch and query. Must be HTTP or HTTPS. Max 2048 characters.
2476
+ schema:
2477
+ type: string
2478
+ format: uri
2479
+ example: https://example.com/pricing
2480
+ - name: question
2481
+ in: query
2482
+ required: true
2483
+ description: The question to answer. Max 1000 characters.
2484
+ schema:
2485
+ type: string
2486
+ example: "What is the price of the pro plan?"
2487
+ - name: render
2488
+ in: query
2489
+ schema:
2490
+ type: boolean
2491
+ default: false
2492
+ description: Use browser rendering for JavaScript-heavy pages.
2493
+ - name: maxPassages
2494
+ in: query
2495
+ schema:
2496
+ type: integer
2497
+ minimum: 1
2498
+ maximum: 10
2499
+ default: 3
2500
+ description: Maximum number of relevant passages to return (1–10).
2501
+ example: 3
2502
+ responses:
2503
+ "200":
2504
+ description: Question answered successfully.
2505
+ headers:
2506
+ X-Processing-Time:
2507
+ $ref: "#/components/headers/X-Processing-Time"
2508
+ X-Credits-Used:
2509
+ $ref: "#/components/headers/X-Credits-Used"
2510
+ X-RateLimit-Limit:
2511
+ $ref: "#/components/headers/X-RateLimit-Limit"
2512
+ X-RateLimit-Remaining:
2513
+ $ref: "#/components/headers/X-RateLimit-Remaining"
2514
+ X-RateLimit-Reset:
2515
+ $ref: "#/components/headers/X-RateLimit-Reset"
2516
+ content:
2517
+ application/json:
2518
+ schema:
2519
+ $ref: "#/components/schemas/QuickAnswerResult"
2520
+ example:
2521
+ url: https://example.com/pricing
2522
+ title: "Pricing – Example"
2523
+ question: "What is the price of the pro plan?"
2524
+ answer: "The Pro plan costs $49 per month."
2525
+ confidence: 0.83
2526
+ passages:
2527
+ - "Pro plan: $49/month — includes unlimited projects and priority support."
2528
+ - "All plans include a 14-day free trial. No credit card required."
2529
+ source: https://example.com/pricing
2530
+ method: bm25
2531
+ "400":
2532
+ description: Invalid request parameters.
2533
+ content:
2534
+ application/json:
2535
+ schema:
2536
+ $ref: "#/components/schemas/Error"
2537
+ examples:
2538
+ missing_url:
2539
+ value:
2540
+ success: false
2541
+ error:
2542
+ type: invalid_request
2543
+ message: 'Missing or invalid "url" parameter'
2544
+ metadata:
2545
+ requestId: "req_abc123"
2546
+ missing_question:
2547
+ value:
2548
+ success: false
2549
+ error:
2550
+ type: invalid_request
2551
+ message: 'Missing or invalid "question" parameter'
2552
+ metadata:
2553
+ requestId: "req_abc123"
2554
+ "401":
2555
+ $ref: "#/components/responses/Unauthorized"
2556
+ "429":
2557
+ $ref: "#/components/responses/RateLimited"
2558
+ "500":
2559
+ $ref: "#/components/responses/InternalError"
2560
+
2561
+ # ===========================================================================
2562
+ # YouTube
2563
+ # ===========================================================================
2564
+
2565
+ /v1/youtube:
2566
+ get:
2567
+ operationId: getYouTubeTranscript
2568
+ summary: Extract YouTube transcript
2569
+ description: |
2570
+ Extract the full transcript and metadata from any YouTube video.
2571
+
2572
+ Returns the complete text transcript, timed segments, and video metadata
2573
+ (title, channel, duration, available languages).
2574
+
2575
+ **Quick start:**
2576
+ ```bash
2577
+ curl "https://api.webpeel.dev/v1/youtube?url=https://youtu.be/dQw4w9WgXcQ" \
2578
+ -H "Authorization: Bearer wp_YOUR_KEY"
2579
+ ```
2580
+
2581
+ **Supported URL formats:**
2582
+ - `https://www.youtube.com/watch?v=VIDEO_ID`
2583
+ - `https://youtu.be/VIDEO_ID`
2584
+ - `https://www.youtube.com/embed/VIDEO_ID`
2585
+ - `https://m.youtube.com/watch?v=VIDEO_ID`
2586
+ tags: [YouTube]
2587
+ parameters:
2588
+ - name: url
2589
+ in: query
2590
+ required: true
2591
+ description: YouTube video URL. Any standard YouTube URL format is accepted.
2592
+ schema:
2593
+ type: string
2594
+ format: uri
2595
+ example: "https://youtu.be/dQw4w9WgXcQ"
2596
+ - name: language
2597
+ in: query
2598
+ schema:
2599
+ type: string
2600
+ default: en
2601
+ description: |
2602
+ Preferred transcript language (BCP 47 language code, e.g., `en`, `fr`, `de`).
2603
+ Falls back to any available language if the preferred one is unavailable.
2604
+ example: en
2605
+ responses:
2606
+ "200":
2607
+ description: Transcript extracted successfully.
2608
+ headers:
2609
+ X-Processing-Time:
2610
+ $ref: "#/components/headers/X-Processing-Time"
2611
+ X-Credits-Used:
2612
+ $ref: "#/components/headers/X-Credits-Used"
2613
+ X-RateLimit-Limit:
2614
+ $ref: "#/components/headers/X-RateLimit-Limit"
2615
+ X-RateLimit-Remaining:
2616
+ $ref: "#/components/headers/X-RateLimit-Remaining"
2617
+ X-RateLimit-Reset:
2618
+ $ref: "#/components/headers/X-RateLimit-Reset"
2619
+ content:
2620
+ application/json:
2621
+ schema:
2622
+ type: object
2623
+ required: [success, videoId, fullText, segments]
2624
+ properties:
2625
+ success:
2626
+ type: boolean
2627
+ const: true
2628
+ videoId:
2629
+ type: string
2630
+ description: YouTube video ID.
2631
+ example: dQw4w9WgXcQ
2632
+ title:
2633
+ type: string
2634
+ description: Video title.
2635
+ example: "Rick Astley - Never Gonna Give You Up (Official Music Video)"
2636
+ channel:
2637
+ type: string
2638
+ description: Channel name.
2639
+ example: "Rick Astley"
2640
+ duration:
2641
+ type: number
2642
+ description: Video duration in seconds.
2643
+ example: 213
2644
+ language:
2645
+ type: string
2646
+ description: Language of the returned transcript.
2647
+ example: en
2648
+ availableLanguages:
2649
+ type: array
2650
+ items:
2651
+ type: string
2652
+ description: All language codes for which transcripts are available.
2653
+ example: ["en", "de", "fr", "es"]
2654
+ fullText:
2655
+ type: string
2656
+ description: Complete transcript as a single string.
2657
+ example: "We're no strangers to love You know the rules and so do I..."
2658
+ segments:
2659
+ type: array
2660
+ items:
2661
+ $ref: "#/components/schemas/YouTubeSegment"
2662
+ description: Timed transcript segments.
2663
+ url:
2664
+ type: string
2665
+ format: uri
2666
+ description: Canonical YouTube URL.
2667
+ example: "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
2668
+ example:
2669
+ success: true
2670
+ videoId: dQw4w9WgXcQ
2671
+ title: "Rick Astley - Never Gonna Give You Up (Official Music Video)"
2672
+ channel: "Rick Astley"
2673
+ duration: 213
2674
+ language: en
2675
+ availableLanguages: ["en", "de", "fr"]
2676
+ fullText: "We're no strangers to love You know the rules and so do I..."
2677
+ segments:
2678
+ - start: 0
2679
+ end: 3.5
2680
+ text: "We're no strangers to love"
2681
+ - start: 3.5
2682
+ end: 7.2
2683
+ text: "You know the rules and so do I"
2684
+ url: "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
2685
+ "400":
2686
+ description: Invalid or unsupported YouTube URL.
2687
+ content:
2688
+ application/json:
2689
+ schema:
2690
+ $ref: "#/components/schemas/Error"
2691
+ examples:
2692
+ missing_url:
2693
+ value:
2694
+ success: false
2695
+ error:
2696
+ type: invalid_request
2697
+ message: 'Missing or invalid "url" parameter. Pass a YouTube URL: GET /v1/youtube?url=https://youtu.be/VIDEO_ID'
2698
+ metadata:
2699
+ requestId: "req_abc123"
2700
+ invalid_youtube_url:
2701
+ value:
2702
+ success: false
2703
+ error:
2704
+ type: invalid_request
2705
+ message: "The provided URL is not a valid YouTube video URL."
2706
+ metadata:
2707
+ requestId: "req_abc123"
2708
+ "401":
2709
+ $ref: "#/components/responses/Unauthorized"
2710
+ "404":
2711
+ description: Video has no captions/subtitles available.
2712
+ content:
2713
+ application/json:
2714
+ schema:
2715
+ $ref: "#/components/schemas/Error"
2716
+ example:
2717
+ success: false
2718
+ error:
2719
+ type: invalid_request
2720
+ message: "No captions are available for this video. The video may not have subtitles."
2721
+ metadata:
2722
+ requestId: "req_abc123"
2723
+ "429":
2724
+ $ref: "#/components/responses/RateLimited"
2725
+ "500":
2726
+ $ref: "#/components/responses/InternalError"
2727
+
2728
+ # ===========================================================================
2729
+ # Batch
2730
+ # ===========================================================================
2731
+
2732
+ /v1/batch/scrape:
2733
+ post:
2734
+ operationId: batchScrape
2735
+ summary: Batch scrape URLs
2736
+ description: |
2737
+ Submit a batch of up to 100 URLs for concurrent scraping. The job is
2738
+ queued immediately and processed asynchronously.
2739
+
2740
+ **Workflow:**
2741
+ 1. `POST /v1/batch/scrape` → returns `{ success: true, id: "job_..." }`
2742
+ 2. Poll `GET /v1/batch/scrape/{id}` to check progress
2743
+ 3. When `status === "completed"`, results are in the `data` field
2744
+
2745
+ **Alternatively**, pass a `webhook` URL to receive results automatically
2746
+ when the job completes.
2747
+
2748
+ **Quick start:**
2749
+ ```bash
2750
+ curl -X POST https://api.webpeel.dev/v1/batch/scrape \
2751
+ -H "Authorization: Bearer wp_YOUR_KEY" \
2752
+ -H "Content-Type: application/json" \
2753
+ -d '{
2754
+ "urls": ["https://example.com", "https://another.com"],
2755
+ "formats": ["markdown"]
2756
+ }'
2757
+ ```
2758
+ tags: [Batch]
2759
+ requestBody:
2760
+ required: true
2761
+ content:
2762
+ application/json:
2763
+ schema:
2764
+ type: object
2765
+ required: [urls]
2766
+ properties:
2767
+ urls:
2768
+ type: array
2769
+ items:
2770
+ type: string
2771
+ format: uri
2772
+ minItems: 1
2773
+ maxItems: 100
2774
+ description: URLs to scrape (1–100).
2775
+ example: ["https://example.com", "https://another-example.com"]
2776
+ formats:
2777
+ type: array
2778
+ items:
2779
+ type: string
2780
+ enum: [markdown, html, text]
2781
+ description: Output formats to include for each URL.
2782
+ example: ["markdown"]
2783
+ extract:
2784
+ $ref: "#/components/schemas/InlineExtract"
2785
+ maxTokens:
2786
+ type: integer
2787
+ description: Maximum token count per page.
2788
+ example: 4000
2789
+ webhook:
2790
+ type: string
2791
+ format: uri
2792
+ description: Webhook URL to receive results when the job completes.
2793
+ example: https://your-server.com/webhook
2794
+ examples:
2795
+ basic_batch:
2796
+ summary: Basic batch scrape
2797
+ value:
2798
+ urls:
2799
+ - https://example.com
2800
+ - https://another-example.com
2801
+ - https://third-example.com
2802
+ formats: [markdown]
2803
+ with_webhook:
2804
+ summary: Batch with webhook delivery
2805
+ value:
2806
+ urls:
2807
+ - https://example.com
2808
+ - https://another-example.com
2809
+ formats: [markdown]
2810
+ maxTokens: 4000
2811
+ webhook: https://your-server.com/webhook/results
2812
+ responses:
2813
+ "202":
2814
+ description: Batch job created and queued.
2815
+ content:
2816
+ application/json:
2817
+ schema:
2818
+ $ref: "#/components/schemas/BatchJobResponse"
2819
+ example:
2820
+ success: true
2821
+ id: "job_a1b2c3d4"
2822
+ url: "/v1/batch/scrape/job_a1b2c3d4"
2823
+ "400":
2824
+ description: Invalid request parameters.
2825
+ content:
2826
+ application/json:
2827
+ schema:
2828
+ $ref: "#/components/schemas/Error"
2829
+ examples:
2830
+ missing_urls:
2831
+ value:
2832
+ success: false
2833
+ error:
2834
+ type: invalid_request
2835
+ message: 'Missing or invalid "urls" parameter (must be non-empty array)'
2836
+ metadata:
2837
+ requestId: "req_abc123"
2838
+ too_many_urls:
2839
+ value:
2840
+ success: false
2841
+ error:
2842
+ type: invalid_request
2843
+ message: "Batch size too large (max 100 URLs)"
2844
+ metadata:
2845
+ requestId: "req_abc123"
2846
+ "401":
2847
+ $ref: "#/components/responses/Unauthorized"
2848
+ "429":
2849
+ $ref: "#/components/responses/RateLimited"
2850
+ "500":
2851
+ $ref: "#/components/responses/InternalError"
2852
+
2853
+ /v1/batch/scrape/{id}:
2854
+ get:
2855
+ operationId: getBatchStatus
2856
+ summary: Get batch job status
2857
+ description: |
2858
+ Poll the status and results of a batch scrape job.
2859
+
2860
+ Poll until `status === "completed"` or `status === "failed"`.
2861
+ tags: [Batch]
2862
+ parameters:
2863
+ - name: id
2864
+ in: path
2865
+ required: true
2866
+ description: Batch job ID returned by `POST /v1/batch/scrape`.
2867
+ schema:
2868
+ type: string
2869
+ example: "job_a1b2c3d4"
2870
+ responses:
2871
+ "200":
2872
+ description: Batch job status and (when complete) results.
2873
+ content:
2874
+ application/json:
2875
+ schema:
2876
+ $ref: "#/components/schemas/BatchJobStatus"
2877
+ example:
2878
+ success: true
2879
+ status: completed
2880
+ total: 3
2881
+ completed: 3
2882
+ creditsUsed: 3
2883
+ data:
2884
+ - url: https://example.com
2885
+ title: "Example Domain"
2886
+ content: "# Example Domain\n\n..."
2887
+ elapsed: 187
2888
+ "401":
2889
+ $ref: "#/components/responses/Unauthorized"
2890
+ "404":
2891
+ description: Job not found.
2892
+ content:
2893
+ application/json:
2894
+ schema:
2895
+ $ref: "#/components/schemas/Error"
2896
+ "500":
2897
+ $ref: "#/components/responses/InternalError"
2898
+
2899
+ delete:
2900
+ operationId: cancelBatchJob
2901
+ summary: Cancel batch job
2902
+ description: Cancel a queued or in-progress batch scrape job.
2903
+ tags: [Batch]
2904
+ parameters:
2905
+ - name: id
2906
+ in: path
2907
+ required: true
2908
+ description: Batch job ID to cancel.
2909
+ schema:
2910
+ type: string
2911
+ example: "job_a1b2c3d4"
2912
+ responses:
2913
+ "200":
2914
+ description: Job cancelled.
2915
+ content:
2916
+ application/json:
2917
+ schema:
2918
+ type: object
2919
+ properties:
2920
+ success:
2921
+ type: boolean
2922
+ const: true
2923
+ message:
2924
+ type: string
2925
+ example:
2926
+ success: true
2927
+ message: "Job cancelled"
2928
+ "400":
2929
+ description: Job cannot be cancelled (already completed or failed).
2930
+ content:
2931
+ application/json:
2932
+ schema:
2933
+ $ref: "#/components/schemas/Error"
2934
+ "401":
2935
+ $ref: "#/components/responses/Unauthorized"
2936
+ "404":
2937
+ description: Job not found.
2938
+ content:
2939
+ application/json:
2940
+ schema:
2941
+ $ref: "#/components/schemas/Error"
2942
+ "500":
2943
+ $ref: "#/components/responses/InternalError"
2944
+
2945
+ # ===========================================================================
2946
+ # Deep Fetch (Research)
2947
+ # ===========================================================================
2948
+
2949
+ /v1/deep-fetch:
2950
+ post:
2951
+ operationId: deepFetch
2952
+ summary: Multi-source deep research
2953
+ description: |
2954
+ Search the web, fetch the top results, and synthesise everything into a
2955
+ merged or structured report — all in a single API call.
2956
+
2957
+ No LLM API key required. Uses BM25 heuristics and text merging.
2958
+
2959
+ **Quick start:**
2960
+ ```bash
2961
+ curl -X POST https://api.webpeel.dev/v1/deep-fetch \
2962
+ -H "Authorization: Bearer wp_YOUR_KEY" \
2963
+ -H "Content-Type: application/json" \
2964
+ -d '{"query": "What are the best practices for React performance?"}'
2965
+ ```
2966
+ tags: [Research]
2967
+ requestBody:
2968
+ required: true
2969
+ content:
2970
+ application/json:
2971
+ schema:
2972
+ type: object
2973
+ required: [query]
2974
+ properties:
2975
+ query:
2976
+ type: string
2977
+ description: The research query or question.
2978
+ example: "What are the best practices for React performance optimization?"
2979
+ count:
2980
+ type: integer
2981
+ minimum: 1
2982
+ maximum: 10
2983
+ default: 5
2984
+ description: Number of search results to fetch and synthesise.
2985
+ example: 5
2986
+ format:
2987
+ type: string
2988
+ enum: [merged, structured, comparison]
2989
+ default: merged
2990
+ description: |
2991
+ Output format:
2992
+ - `merged` — combined content from all sources
2993
+ - `structured` — structured JSON output per source
2994
+ - `comparison` — side-by-side comparison of sources
2995
+ maxChars:
2996
+ type: integer
2997
+ default: 32000
2998
+ description: Maximum characters in the synthesised output.
2999
+ example: 16000
3000
+ examples:
3001
+ basic_research:
3002
+ summary: Basic research query
3003
+ value:
3004
+ query: "What are the best practices for React performance optimization?"
3005
+ count: 5
3006
+ format: merged
3007
+ structured_research:
3008
+ summary: Structured output
3009
+ value:
3010
+ query: "Compare PostgreSQL vs MySQL for high-traffic applications"
3011
+ count: 7
3012
+ format: structured
3013
+ maxChars: 20000
3014
+ responses:
3015
+ "200":
3016
+ description: Research results synthesised successfully.
3017
+ content:
3018
+ application/json:
3019
+ schema:
3020
+ type: object
3021
+ description: Research result shape varies by `format` parameter.
3022
+ properties:
3023
+ query:
3024
+ type: string
3025
+ sources:
3026
+ type: array
3027
+ items:
3028
+ type: object
3029
+ properties:
3030
+ url:
3031
+ type: string
3032
+ format: uri
3033
+ title:
3034
+ type: string
3035
+ content:
3036
+ type: string
3037
+ merged:
3038
+ type: string
3039
+ description: Merged content from all sources (when format=merged).
3040
+ structured:
3041
+ type: array
3042
+ description: Per-source structured output (when format=structured).
3043
+ items:
3044
+ type: object
3045
+ additionalProperties: true
3046
+ elapsed:
3047
+ type: integer
3048
+ description: Total processing time in milliseconds.
3049
+ example:
3050
+ query: "What are the best practices for React performance optimization?"
3051
+ sources:
3052
+ - url: https://react.dev/learn/render-and-commit
3053
+ title: "Render and Commit – React"
3054
+ content: "React renders your components in three steps..."
3055
+ - url: https://web.dev/react-performance
3056
+ title: "Optimizing React Performance – web.dev"
3057
+ content: "To optimize React performance, start with..."
3058
+ merged: "## React Performance Best Practices\n\nReact renders your components in three steps..."
3059
+ elapsed: 3200
3060
+ "400":
3061
+ description: Invalid request parameters.
3062
+ content:
3063
+ application/json:
3064
+ schema:
3065
+ $ref: "#/components/schemas/Error"
3066
+ example:
3067
+ success: false
3068
+ error:
3069
+ type: invalid_request
3070
+ message: "Missing required field: query"
3071
+ metadata:
3072
+ requestId: "req_abc123"
3073
+ "401":
3074
+ $ref: "#/components/responses/Unauthorized"
3075
+ "429":
3076
+ $ref: "#/components/responses/RateLimited"
3077
+ "500":
3078
+ $ref: "#/components/responses/InternalError"
3079
+
3080
+ # ===========================================================================
3081
+ # Screenshot
3082
+ # ===========================================================================
3083
+
3084
+ /v1/screenshot:
3085
+ post:
3086
+ operationId: takeScreenshot
3087
+ summary: Take a screenshot
3088
+ description: |
3089
+ Capture a screenshot of any URL. Returns a base64-encoded data URL image.
3090
+
3091
+ Supports full-page capture, custom viewport sizing, JPEG quality control,
3092
+ page actions (e.g., click a button before screenshotting), and stealth mode.
3093
+
3094
+ **Quick start:**
3095
+ ```bash
3096
+ curl -X POST https://api.webpeel.dev/v1/screenshot \
3097
+ -H "Authorization: Bearer wp_YOUR_KEY" \
3098
+ -H "Content-Type: application/json" \
3099
+ -d '{"url": "https://example.com", "fullPage": true}'
3100
+ ```
3101
+ tags: [Screenshot]
3102
+ requestBody:
3103
+ required: true
3104
+ content:
3105
+ application/json:
3106
+ schema:
3107
+ type: object
3108
+ required: [url]
3109
+ properties:
3110
+ url:
3111
+ type: string
3112
+ format: uri
3113
+ description: The URL to screenshot. Must be HTTP or HTTPS. Max 2048 characters.
3114
+ example: https://example.com
3115
+ fullPage:
3116
+ type: boolean
3117
+ default: false
3118
+ description: Capture the full scrollable page (default is viewport only).
3119
+ width:
3120
+ type: integer
3121
+ minimum: 100
3122
+ maximum: 5000
3123
+ default: 1280
3124
+ description: Viewport width in pixels.
3125
+ height:
3126
+ type: integer
3127
+ minimum: 100
3128
+ maximum: 5000
3129
+ default: 720
3130
+ description: Viewport height in pixels.
3131
+ format:
3132
+ type: string
3133
+ enum: [png, jpeg, jpg]
3134
+ default: png
3135
+ description: Image format.
3136
+ quality:
3137
+ type: integer
3138
+ minimum: 1
3139
+ maximum: 100
3140
+ description: JPEG quality (1–100). Ignored for PNG.
3141
+ example: 85
3142
+ waitFor:
3143
+ type: integer
3144
+ minimum: 0
3145
+ maximum: 60000
3146
+ description: Milliseconds to wait after page load before capturing.
3147
+ example: 1000
3148
+ timeout:
3149
+ type: integer
3150
+ description: Request timeout in milliseconds. Default is 30000.
3151
+ example: 30000
3152
+ stealth:
3153
+ type: boolean
3154
+ default: false
3155
+ description: Use stealth mode to bypass bot detection.
3156
+ actions:
3157
+ type: array
3158
+ items:
3159
+ $ref: "#/components/schemas/PageAction"
3160
+ description: Browser actions to perform before taking the screenshot.
3161
+ headers:
3162
+ type: object
3163
+ additionalProperties:
3164
+ type: string
3165
+ description: Custom HTTP headers to send with the request.
3166
+ example:
3167
+ Accept-Language: "en-US"
3168
+ cookies:
3169
+ type: array
3170
+ items:
3171
+ type: string
3172
+ description: Cookies to set (key=value format).
3173
+ example: ["session_id=abc123", "pref=dark"]
3174
+ examples:
3175
+ viewport:
3176
+ summary: Viewport screenshot
3177
+ value:
3178
+ url: https://example.com
3179
+ width: 1440
3180
+ height: 900
3181
+ format: png
3182
+ full_page:
3183
+ summary: Full-page JPEG
3184
+ value:
3185
+ url: https://example.com
3186
+ fullPage: true
3187
+ format: jpeg
3188
+ quality: 85
3189
+ with_actions:
3190
+ summary: Screenshot after clicking a button
3191
+ value:
3192
+ url: https://example.com/dashboard
3193
+ actions:
3194
+ - type: click
3195
+ selector: "#expand-all"
3196
+ - type: wait
3197
+ ms: 1000
3198
+ fullPage: true
3199
+ responses:
3200
+ "200":
3201
+ description: Screenshot captured successfully.
3202
+ headers:
3203
+ X-Credits-Used:
3204
+ $ref: "#/components/headers/X-Credits-Used"
3205
+ X-Processing-Time:
3206
+ $ref: "#/components/headers/X-Processing-Time"
3207
+ X-Request-Id:
3208
+ $ref: "#/components/headers/X-Request-Id"
3209
+ X-RateLimit-Limit:
3210
+ $ref: "#/components/headers/X-RateLimit-Limit"
3211
+ X-RateLimit-Remaining:
3212
+ $ref: "#/components/headers/X-RateLimit-Remaining"
3213
+ X-RateLimit-Reset:
3214
+ $ref: "#/components/headers/X-RateLimit-Reset"
3215
+ content:
3216
+ application/json:
3217
+ schema:
3218
+ $ref: "#/components/schemas/ScreenshotResult"
3219
+ example:
3220
+ success: true
3221
+ data:
3222
+ url: https://example.com
3223
+ screenshot: "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."
3224
+ metadata:
3225
+ sourceURL: https://example.com
3226
+ format: png
3227
+ width: 1280
3228
+ height: 720
3229
+ fullPage: false
3230
+ "400":
3231
+ description: Invalid request parameters.
3232
+ content:
3233
+ application/json:
3234
+ schema:
3235
+ $ref: "#/components/schemas/Error"
3236
+ examples:
3237
+ missing_url:
3238
+ value:
3239
+ success: false
3240
+ error:
3241
+ type: invalid_request
3242
+ message: 'Missing or invalid "url" parameter'
3243
+ metadata:
3244
+ requestId: "req_abc123"
3245
+ invalid_dimensions:
3246
+ value:
3247
+ success: false
3248
+ error:
3249
+ type: invalid_request
3250
+ message: "Invalid width: must be between 100 and 5000"
3251
+ metadata:
3252
+ requestId: "req_abc123"
3253
+ "401":
3254
+ $ref: "#/components/responses/Unauthorized"
3255
+ "429":
3256
+ $ref: "#/components/responses/RateLimited"
3257
+ "500":
3258
+ $ref: "#/components/responses/InternalError"
3259
+
3260
+ # ===========================================================================
3261
+ # Watch
3262
+ # ===========================================================================
3263
+
3264
+ /v1/watch:
3265
+ post:
3266
+ operationId: createWatch
3267
+ summary: Create a URL watcher
3268
+ description: |
3269
+ Monitor a URL for content changes. The watcher checks the URL at the configured
3270
+ interval and notifies a webhook URL when changes are detected.
3271
+
3272
+ **Quick start:**
3273
+ ```bash
3274
+ curl -X POST https://api.webpeel.dev/v1/watch \
3275
+ -H "Authorization: Bearer wp_YOUR_KEY" \
3276
+ -H "Content-Type: application/json" \
3277
+ -d '{
3278
+ "url": "https://example.com/pricing",
3279
+ "webhookUrl": "https://your-server.com/webhook",
3280
+ "checkIntervalMinutes": 60
3281
+ }'
3282
+ ```
3283
+ tags: [Watch]
3284
+ requestBody:
3285
+ required: true
3286
+ content:
3287
+ application/json:
3288
+ schema:
3289
+ type: object
3290
+ required: [url]
3291
+ properties:
3292
+ url:
3293
+ type: string
3294
+ format: uri
3295
+ description: The URL to monitor. Must be HTTP or HTTPS. Max 2048 characters.
3296
+ example: https://example.com/pricing
3297
+ webhookUrl:
3298
+ type: string
3299
+ format: uri
3300
+ description: Webhook URL to notify when changes are detected.
3301
+ example: https://your-server.com/webhook/price-change
3302
+ checkIntervalMinutes:
3303
+ type: integer
3304
+ minimum: 1
3305
+ maximum: 44640
3306
+ default: 60
3307
+ description: How often to check for changes, in minutes (1–44640, i.e., up to 31 days).
3308
+ example: 60
3309
+ selector:
3310
+ type: string
3311
+ description: CSS selector to limit change detection to a specific part of the page.
3312
+ example: ".price"
3313
+ examples:
3314
+ basic_watch:
3315
+ summary: Watch a pricing page
3316
+ value:
3317
+ url: https://example.com/pricing
3318
+ webhookUrl: https://your-server.com/webhook
3319
+ checkIntervalMinutes: 60
3320
+ selector_watch:
3321
+ summary: Watch only a specific element
3322
+ value:
3323
+ url: https://example.com/product/123
3324
+ webhookUrl: https://your-server.com/webhook/price
3325
+ selector: ".product-price"
3326
+ checkIntervalMinutes: 30
3327
+ responses:
3328
+ "201":
3329
+ description: Watcher created successfully.
3330
+ content:
3331
+ application/json:
3332
+ schema:
3333
+ type: object
3334
+ required: [ok, watch]
3335
+ properties:
3336
+ ok:
3337
+ type: boolean
3338
+ const: true
3339
+ watch:
3340
+ $ref: "#/components/schemas/WatchEntry"
3341
+ example:
3342
+ ok: true
3343
+ watch:
3344
+ id: "watch_a1b2c3d4"
3345
+ accountId: "550e8400-e29b-41d4-a716-446655440000"
3346
+ url: https://example.com/pricing
3347
+ webhookUrl: https://your-server.com/webhook
3348
+ checkIntervalMinutes: 60
3349
+ status: active
3350
+ createdAt: "2025-02-25T08:00:00.000Z"
3351
+ "400":
3352
+ description: Invalid request parameters.
3353
+ content:
3354
+ application/json:
3355
+ schema:
3356
+ $ref: "#/components/schemas/Error"
3357
+ examples:
3358
+ missing_url:
3359
+ value:
3360
+ success: false
3361
+ error:
3362
+ type: invalid_request
3363
+ message: 'Missing or invalid "url" parameter.'
3364
+ metadata:
3365
+ requestId: "req_abc123"
3366
+ invalid_interval:
3367
+ value:
3368
+ success: false
3369
+ error:
3370
+ type: invalid_request
3371
+ message: '"checkIntervalMinutes" must be between 1 and 44640 (31 days).'
3372
+ metadata:
3373
+ requestId: "req_abc123"
3374
+ "401":
3375
+ $ref: "#/components/responses/Unauthorized"
3376
+ "429":
3377
+ $ref: "#/components/responses/RateLimited"
3378
+ "500":
3379
+ $ref: "#/components/responses/InternalError"
3380
+
3381
+ get:
3382
+ operationId: listWatches
3383
+ summary: List URL watchers
3384
+ description: |
3385
+ Return all URL watchers for the authenticated account.
3386
+
3387
+ **Quick start:**
3388
+ ```bash
3389
+ curl "https://api.webpeel.dev/v1/watch" \
3390
+ -H "Authorization: Bearer wp_YOUR_KEY"
3391
+ ```
3392
+ tags: [Watch]
3393
+ responses:
3394
+ "200":
3395
+ description: List of watchers.
3396
+ content:
3397
+ application/json:
3398
+ schema:
3399
+ type: object
3400
+ required: [ok, watches]
3401
+ properties:
3402
+ ok:
3403
+ type: boolean
3404
+ const: true
3405
+ watches:
3406
+ type: array
3407
+ items:
3408
+ $ref: "#/components/schemas/WatchEntry"
3409
+ example:
3410
+ ok: true
3411
+ watches:
3412
+ - id: "watch_a1b2c3d4"
3413
+ accountId: "550e8400-e29b-41d4-a716-446655440000"
3414
+ url: https://example.com/pricing
3415
+ webhookUrl: https://your-server.com/webhook
3416
+ checkIntervalMinutes: 60
3417
+ status: active
3418
+ createdAt: "2025-02-25T08:00:00.000Z"
3419
+ "401":
3420
+ $ref: "#/components/responses/Unauthorized"
3421
+ "500":
3422
+ $ref: "#/components/responses/InternalError"
3423
+
3424
+ /v1/watch/{id}:
3425
+ get:
3426
+ operationId: getWatch
3427
+ summary: Get a URL watcher
3428
+ description: Retrieve a specific URL watcher by ID.
3429
+ tags: [Watch]
3430
+ parameters:
3431
+ - name: id
3432
+ in: path
3433
+ required: true
3434
+ description: Watcher ID.
3435
+ schema:
3436
+ type: string
3437
+ example: "watch_a1b2c3d4"
3438
+ responses:
3439
+ "200":
3440
+ description: Watcher details.
3441
+ content:
3442
+ application/json:
3443
+ schema:
3444
+ type: object
3445
+ required: [ok, watch]
3446
+ properties:
3447
+ ok:
3448
+ type: boolean
3449
+ const: true
3450
+ watch:
3451
+ $ref: "#/components/schemas/WatchEntry"
3452
+ "401":
3453
+ $ref: "#/components/responses/Unauthorized"
3454
+ "403":
3455
+ description: Access denied — watcher belongs to another account.
3456
+ content:
3457
+ application/json:
3458
+ schema:
3459
+ $ref: "#/components/schemas/Error"
3460
+ "404":
3461
+ description: Watcher not found.
3462
+ content:
3463
+ application/json:
3464
+ schema:
3465
+ $ref: "#/components/schemas/Error"
3466
+ "500":
3467
+ $ref: "#/components/responses/InternalError"
3468
+
3469
+ delete:
3470
+ operationId: deleteWatch
3471
+ summary: Delete a URL watcher
3472
+ description: |
3473
+ Delete a URL watcher. The watcher will stop checking the URL immediately.
3474
+
3475
+ **Quick start:**
3476
+ ```bash
3477
+ curl -X DELETE "https://api.webpeel.dev/v1/watch/watch_a1b2c3d4" \
3478
+ -H "Authorization: Bearer wp_YOUR_KEY"
3479
+ ```
3480
+ tags: [Watch]
3481
+ parameters:
3482
+ - name: id
3483
+ in: path
3484
+ required: true
3485
+ description: Watcher ID to delete.
3486
+ schema:
3487
+ type: string
3488
+ example: "watch_a1b2c3d4"
3489
+ responses:
3490
+ "200":
3491
+ description: Watcher deleted successfully.
3492
+ content:
3493
+ application/json:
3494
+ schema:
3495
+ type: object
3496
+ required: [ok, deleted]
3497
+ properties:
3498
+ ok:
3499
+ type: boolean
3500
+ const: true
3501
+ deleted:
3502
+ type: string
3503
+ description: ID of the deleted watcher.
3504
+ example: "watch_a1b2c3d4"
3505
+ example:
3506
+ ok: true
3507
+ deleted: "watch_a1b2c3d4"
3508
+ "401":
3509
+ $ref: "#/components/responses/Unauthorized"
3510
+ "403":
3511
+ description: Access denied — watcher belongs to another account.
3512
+ content:
3513
+ application/json:
3514
+ schema:
3515
+ $ref: "#/components/schemas/Error"
3516
+ "404":
3517
+ description: Watcher not found.
3518
+ content:
3519
+ application/json:
3520
+ schema:
3521
+ $ref: "#/components/schemas/Error"
3522
+ "500":
3523
+ $ref: "#/components/responses/InternalError"
3524
+
3525
+ patch:
3526
+ operationId: updateWatch
3527
+ summary: Update a URL watcher
3528
+ description: Update the configuration of an existing watcher (pause, resume, change interval, etc.).
3529
+ tags: [Watch]
3530
+ parameters:
3531
+ - name: id
3532
+ in: path
3533
+ required: true
3534
+ description: Watcher ID to update.
3535
+ schema:
3536
+ type: string
3537
+ example: "watch_a1b2c3d4"
3538
+ requestBody:
3539
+ required: true
3540
+ content:
3541
+ application/json:
3542
+ schema:
3543
+ type: object
3544
+ properties:
3545
+ status:
3546
+ type: string
3547
+ enum: [active, paused]
3548
+ description: Set to `paused` to pause checking, `active` to resume.
3549
+ webhookUrl:
3550
+ type: string
3551
+ format: uri
3552
+ description: New webhook URL.
3553
+ checkIntervalMinutes:
3554
+ type: integer
3555
+ minimum: 1
3556
+ maximum: 44640
3557
+ description: New check interval in minutes.
3558
+ selector:
3559
+ type: string
3560
+ description: New CSS selector for change detection scope.
3561
+ examples:
3562
+ pause:
3563
+ summary: Pause a watcher
3564
+ value:
3565
+ status: paused
3566
+ change_interval:
3567
+ summary: Change check interval
3568
+ value:
3569
+ checkIntervalMinutes: 30
3570
+ responses:
3571
+ "200":
3572
+ description: Watcher updated.
3573
+ content:
3574
+ application/json:
3575
+ schema:
3576
+ type: object
3577
+ required: [ok, watch]
3578
+ properties:
3579
+ ok:
3580
+ type: boolean
3581
+ const: true
3582
+ watch:
3583
+ $ref: "#/components/schemas/WatchEntry"
3584
+ "400":
3585
+ description: Invalid update parameters.
3586
+ content:
3587
+ application/json:
3588
+ schema:
3589
+ $ref: "#/components/schemas/Error"
3590
+ "401":
3591
+ $ref: "#/components/responses/Unauthorized"
3592
+ "403":
3593
+ description: Access denied.
3594
+ content:
3595
+ application/json:
3596
+ schema:
3597
+ $ref: "#/components/schemas/Error"
3598
+ "404":
3599
+ description: Watcher not found.
3600
+ content:
3601
+ application/json:
3602
+ schema:
3603
+ $ref: "#/components/schemas/Error"
3604
+ "500":
3605
+ $ref: "#/components/responses/InternalError"
3606
+
3607
+ /v1/watch/{id}/check:
3608
+ post:
3609
+ operationId: triggerWatchCheck
3610
+ summary: Manually trigger a content check
3611
+ description: Trigger an immediate content check for a watcher, outside its normal schedule.
3612
+ tags: [Watch]
3613
+ parameters:
3614
+ - name: id
3615
+ in: path
3616
+ required: true
3617
+ description: Watcher ID.
3618
+ schema:
3619
+ type: string
3620
+ example: "watch_a1b2c3d4"
3621
+ responses:
3622
+ "200":
3623
+ description: Check completed. Returns the diff (if any change was detected).
3624
+ content:
3625
+ application/json:
3626
+ schema:
3627
+ type: object
3628
+ required: [ok]
3629
+ properties:
3630
+ ok:
3631
+ type: boolean
3632
+ const: true
3633
+ diff:
3634
+ type: object
3635
+ description: Change diff if content changed since last check. Null if no change.
3636
+ nullable: true
3637
+ properties:
3638
+ changed:
3639
+ type: boolean
3640
+ previous:
3641
+ type: string
3642
+ current:
3643
+ type: string
3644
+ diffText:
3645
+ type: string
3646
+ "401":
3647
+ $ref: "#/components/responses/Unauthorized"
3648
+ "403":
3649
+ description: Access denied.
3650
+ content:
3651
+ application/json:
3652
+ schema:
3653
+ $ref: "#/components/schemas/Error"
3654
+ "404":
3655
+ description: Watcher not found.
3656
+ content:
3657
+ application/json:
3658
+ schema:
3659
+ $ref: "#/components/schemas/Error"
3660
+ "500":
3661
+ $ref: "#/components/responses/InternalError"
3662
+
3663
+ # ===========================================================================
3664
+ # Auth
3665
+ # ===========================================================================
3666
+
3667
+ /v1/auth/register:
3668
+ post:
3669
+ operationId: registerUser
3670
+ summary: Register a new account
3671
+ description: |
3672
+ Create a new WebPeel account. Returns the user profile and an API key.
3673
+
3674
+ > **Important:** The API key is returned **only once** in this response.
3675
+ > Store it securely — it cannot be retrieved again. If lost, create a new one
3676
+ > via the dashboard at [app.webpeel.dev](https://app.webpeel.dev).
3677
+
3678
+ **Password requirements:** 8–128 characters.
3679
+
3680
+ **Quick start:**
3681
+ ```bash
3682
+ curl -X POST https://api.webpeel.dev/v1/auth/register \
3683
+ -H "Content-Type: application/json" \
3684
+ -d '{"email": "user@example.com", "password": "MySecureP@ssw0rd"}'
3685
+ ```
3686
+ tags: [Auth]
3687
+ security: []
3688
+ requestBody:
3689
+ required: true
3690
+ content:
3691
+ application/json:
3692
+ schema:
3693
+ type: object
3694
+ required: [email, password]
3695
+ properties:
3696
+ email:
3697
+ type: string
3698
+ format: email
3699
+ description: Account email address.
3700
+ example: user@example.com
3701
+ password:
3702
+ type: string
3703
+ minLength: 8
3704
+ maxLength: 128
3705
+ description: Password (8–128 characters).
3706
+ example: "MySecureP@ssw0rd"
3707
+ example:
3708
+ email: user@example.com
3709
+ password: "MySecureP@ssw0rd"
3710
+ responses:
3711
+ "201":
3712
+ description: Account created successfully.
3713
+ content:
3714
+ application/json:
3715
+ schema:
3716
+ type: object
3717
+ required: [user, apiKey]
3718
+ properties:
3719
+ user:
3720
+ $ref: "#/components/schemas/UserObject"
3721
+ apiKey:
3722
+ type: string
3723
+ description: |
3724
+ Your WebPeel API key. **Shown only once — store it securely.**
3725
+ example: "wp_abc123def456..."
3726
+ example:
3727
+ user:
3728
+ id: "550e8400-e29b-41d4-a716-446655440000"
3729
+ email: user@example.com
3730
+ tier: free
3731
+ weeklyLimit: 500
3732
+ burstLimit: 50
3733
+ rateLimit: 10
3734
+ createdAt: "2025-02-25T08:00:00.000Z"
3735
+ apiKey: "wp_abc123def456ghi789..."
3736
+ "400":
3737
+ description: Invalid input (email format, weak password).
3738
+ content:
3739
+ application/json:
3740
+ schema:
3741
+ $ref: "#/components/schemas/Error"
3742
+ examples:
3743
+ missing_fields:
3744
+ value:
3745
+ success: false
3746
+ error:
3747
+ type: invalid_request
3748
+ message: "Email and password are required"
3749
+ metadata:
3750
+ requestId: "req_abc123"
3751
+ invalid_email:
3752
+ value:
3753
+ success: false
3754
+ error:
3755
+ type: invalid_request
3756
+ message: "Invalid email format"
3757
+ metadata:
3758
+ requestId: "req_abc123"
3759
+ weak_password:
3760
+ value:
3761
+ success: false
3762
+ error:
3763
+ type: invalid_request
3764
+ message: "Password must be at least 8 characters"
3765
+ metadata:
3766
+ requestId: "req_abc123"
3767
+ "409":
3768
+ description: Email already registered.
3769
+ content:
3770
+ application/json:
3771
+ schema:
3772
+ $ref: "#/components/schemas/Error"
3773
+ example:
3774
+ success: false
3775
+ error:
3776
+ type: invalid_request
3777
+ message: "Email already registered"
3778
+ metadata:
3779
+ requestId: "req_abc123"
3780
+ "500":
3781
+ $ref: "#/components/responses/InternalError"
3782
+
3783
+ /v1/auth/login:
3784
+ post:
3785
+ operationId: loginUser
3786
+ summary: Log in with email and password
3787
+ description: |
3788
+ Authenticate with email and password. Returns a short-lived JWT access token
3789
+ (1 hour) and a long-lived refresh token (30 days).
3790
+
3791
+ Use the JWT token to authenticate requests to user management endpoints.
3792
+ Use your API key (returned at registration) to authenticate API requests.
3793
+
3794
+ **Rate limiting:** Limited to 5 attempts per email per 15 minutes.
3795
+
3796
+ **Quick start:**
3797
+ ```bash
3798
+ curl -X POST https://api.webpeel.dev/v1/auth/login \
3799
+ -H "Content-Type: application/json" \
3800
+ -d '{"email": "user@example.com", "password": "MySecureP@ssw0rd"}'
3801
+ ```
3802
+ tags: [Auth]
3803
+ security: []
3804
+ requestBody:
3805
+ required: true
3806
+ content:
3807
+ application/json:
3808
+ schema:
3809
+ type: object
3810
+ required: [email, password]
3811
+ properties:
3812
+ email:
3813
+ type: string
3814
+ format: email
3815
+ example: user@example.com
3816
+ password:
3817
+ type: string
3818
+ example: "MySecureP@ssw0rd"
3819
+ example:
3820
+ email: user@example.com
3821
+ password: "MySecureP@ssw0rd"
3822
+ responses:
3823
+ "200":
3824
+ description: Login successful.
3825
+ content:
3826
+ application/json:
3827
+ schema:
3828
+ type: object
3829
+ required: [token, refreshToken, expiresIn, user]
3830
+ properties:
3831
+ token:
3832
+ type: string
3833
+ description: JWT access token (valid for 1 hour).
3834
+ example: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
3835
+ refreshToken:
3836
+ type: string
3837
+ description: Refresh token (valid for 30 days).
3838
+ example: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
3839
+ expiresIn:
3840
+ type: integer
3841
+ description: Access token lifetime in seconds.
3842
+ example: 3600
3843
+ user:
3844
+ type: object
3845
+ properties:
3846
+ id:
3847
+ type: string
3848
+ format: uuid
3849
+ email:
3850
+ type: string
3851
+ format: email
3852
+ tier:
3853
+ type: string
3854
+ enum: [free, starter, pro, enterprise]
3855
+ example:
3856
+ token: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
3857
+ refreshToken: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
3858
+ expiresIn: 3600
3859
+ user:
3860
+ id: "550e8400-e29b-41d4-a716-446655440000"
3861
+ email: user@example.com
3862
+ tier: free
3863
+ "400":
3864
+ description: Missing fields.
3865
+ content:
3866
+ application/json:
3867
+ schema:
3868
+ $ref: "#/components/schemas/Error"
3869
+ example:
3870
+ success: false
3871
+ error:
3872
+ type: invalid_request
3873
+ message: "Email and password are required"
3874
+ metadata:
3875
+ requestId: "req_abc123"
3876
+ "401":
3877
+ description: Invalid credentials.
3878
+ content:
3879
+ application/json:
3880
+ schema:
3881
+ $ref: "#/components/schemas/Error"
3882
+ example:
3883
+ success: false
3884
+ error:
3885
+ type: unauthorized
3886
+ message: "Invalid email or password"
3887
+ metadata:
3888
+ requestId: "req_abc123"
3889
+ "429":
3890
+ description: Too many login attempts.
3891
+ content:
3892
+ application/json:
3893
+ schema:
3894
+ $ref: "#/components/schemas/Error"
3895
+ example:
3896
+ success: false
3897
+ error:
3898
+ type: rate_limited
3899
+ message: "Too many login attempts. Please try again in 15 minutes."
3900
+ hint: "Wait 15 minutes before retrying."
3901
+ metadata:
3902
+ requestId: "req_abc123"
3903
+ "500":
3904
+ $ref: "#/components/responses/InternalError"
3905
+
3906
+ /v1/auth/refresh:
3907
+ post:
3908
+ operationId: refreshToken
3909
+ summary: Refresh access token
3910
+ description: |
3911
+ Exchange a valid refresh token for a new access token and refresh token.
3912
+ The old refresh token is immediately invalidated (rotation).
3913
+
3914
+ **Rate limiting:** Limited to 10 attempts per IP per 15 minutes.
3915
+ tags: [Auth]
3916
+ security: []
3917
+ requestBody:
3918
+ required: true
3919
+ content:
3920
+ application/json:
3921
+ schema:
3922
+ type: object
3923
+ required: [refreshToken]
3924
+ properties:
3925
+ refreshToken:
3926
+ type: string
3927
+ description: The refresh token received from login.
3928
+ example: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
3929
+ example:
3930
+ refreshToken: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
3931
+ responses:
3932
+ "200":
3933
+ description: New access token issued.
3934
+ content:
3935
+ application/json:
3936
+ schema:
3937
+ type: object
3938
+ required: [token, refreshToken, expiresIn]
3939
+ properties:
3940
+ token:
3941
+ type: string
3942
+ description: New JWT access token (valid for 1 hour).
3943
+ refreshToken:
3944
+ type: string
3945
+ description: New refresh token (valid for 30 days). Old token is invalidated.
3946
+ expiresIn:
3947
+ type: integer
3948
+ example: 3600
3949
+ example:
3950
+ token: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
3951
+ refreshToken: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
3952
+ expiresIn: 3600
3953
+ "400":
3954
+ description: Missing refresh token.
3955
+ content:
3956
+ application/json:
3957
+ schema:
3958
+ $ref: "#/components/schemas/Error"
3959
+ "401":
3960
+ description: Invalid or expired refresh token.
3961
+ content:
3962
+ application/json:
3963
+ schema:
3964
+ $ref: "#/components/schemas/Error"
3965
+ "429":
3966
+ description: Too many refresh attempts.
3967
+ content:
3968
+ application/json:
3969
+ schema:
3970
+ $ref: "#/components/schemas/Error"
3971
+ "500":
3972
+ $ref: "#/components/responses/InternalError"
3973
+
3974
+ # ===========================================================================
3975
+ # MCP
3976
+ # ===========================================================================
3977
+
3978
+ /mcp:
3979
+ post:
3980
+ operationId: mcpPost
3981
+ summary: MCP endpoint (Streamable HTTP)
3982
+ description: |
3983
+ Model Context Protocol (MCP) endpoint using [Streamable HTTP transport](https://modelcontextprotocol.io/).
3984
+ Accepts JSON-RPC 2.0 messages and returns MCP-formatted responses.
3985
+
3986
+ Connect your AI assistant:
3987
+ ```json
3988
+ { "url": "https://api.webpeel.dev/mcp" }
3989
+ ```
3990
+
3991
+ **Available MCP tools:**
3992
+ - `webpeel_fetch` — Fetch any URL
3993
+ - `webpeel_search` — Web search
3994
+ - `webpeel_crawl` — Crawl a website
3995
+ - `webpeel_map` — Discover all URLs on a domain
3996
+ - `webpeel_extract` — LLM-powered structured extraction (BYOK)
3997
+ - `webpeel_auto_extract` — Heuristic auto-extraction (no LLM)
3998
+ - `webpeel_batch` — Fetch multiple URLs concurrently
3999
+ - `webpeel_screenshot` — Take a screenshot
4000
+ - `webpeel_youtube` — Extract YouTube transcripts
4001
+ - `webpeel_quick_answer` — BM25 Q&A (no LLM)
4002
+ - `webpeel_deep_fetch` — Multi-source research
4003
+ - `webpeel_watch` — Monitor URLs for changes
4004
+ tags: [MCP]
4005
+ requestBody:
4006
+ required: true
4007
+ content:
4008
+ application/json:
4009
+ schema:
4010
+ type: object
4011
+ description: MCP JSON-RPC 2.0 request.
4012
+ required: [jsonrpc, method]
4013
+ properties:
4014
+ jsonrpc:
4015
+ type: string
4016
+ const: "2.0"
4017
+ method:
4018
+ type: string
4019
+ description: JSON-RPC method name.
4020
+ example: tools/call
4021
+ params:
4022
+ type: object
4023
+ description: Method parameters.
4024
+ id:
4025
+ oneOf:
4026
+ - type: string
4027
+ - type: integer
4028
+ - type: "null"
4029
+ examples:
4030
+ list_tools:
4031
+ summary: List available tools
4032
+ value:
4033
+ jsonrpc: "2.0"
4034
+ method: tools/list
4035
+ id: 1
4036
+ call_fetch:
4037
+ summary: Fetch a URL via MCP
4038
+ value:
4039
+ jsonrpc: "2.0"
4040
+ method: tools/call
4041
+ params:
4042
+ name: webpeel_fetch
4043
+ arguments:
4044
+ url: https://example.com
4045
+ format: markdown
4046
+ budget: 4000
4047
+ id: 2
4048
+ responses:
4049
+ "200":
4050
+ description: MCP JSON-RPC response.
4051
+ content:
4052
+ application/json:
4053
+ schema:
4054
+ type: object
4055
+ description: MCP JSON-RPC 2.0 response.
4056
+ required: [jsonrpc, id]
4057
+ properties:
4058
+ jsonrpc:
4059
+ type: string
4060
+ const: "2.0"
4061
+ result:
4062
+ type: object
4063
+ description: Result on success.
4064
+ error:
4065
+ type: object
4066
+ description: Error on failure.
4067
+ properties:
4068
+ code:
4069
+ type: integer
4070
+ message:
4071
+ type: string
4072
+ id:
4073
+ oneOf:
4074
+ - type: string
4075
+ - type: integer
4076
+ - type: "null"
4077
+ example:
4078
+ jsonrpc: "2.0"
4079
+ result:
4080
+ content:
4081
+ - type: text
4082
+ text: '{"url":"https://example.com","title":"Example Domain","content":"# Example Domain\n\n..."}'
4083
+ id: 2
4084
+ "401":
4085
+ description: Authentication required.
4086
+ content:
4087
+ application/json:
4088
+ schema:
4089
+ type: object
4090
+ example:
4091
+ jsonrpc: "2.0"
4092
+ error:
4093
+ code: -32001
4094
+ message: "Authentication required. Pass API key via Authorization: Bearer <key> header."
4095
+ id: null
4096
+ "405":
4097
+ description: Method not allowed.
4098
+ content:
4099
+ application/json:
4100
+ schema:
4101
+ type: object
4102
+
4103
+ get:
4104
+ operationId: mcpGetNotAllowed
4105
+ summary: MCP endpoint (GET — not allowed)
4106
+ description: Returns 405. Use POST to send MCP JSON-RPC messages.
4107
+ tags: [MCP]
4108
+ responses:
4109
+ "405":
4110
+ description: Method not allowed.
4111
+ content:
4112
+ application/json:
4113
+ schema:
4114
+ type: object
4115
+
4116
+ /v2/mcp:
4117
+ post:
4118
+ operationId: mcpPostV2
4119
+ summary: MCP endpoint v2 (canonical)
4120
+ description: |
4121
+ Canonical v2 MCP endpoint. Identical behaviour to `POST /mcp`.
4122
+
4123
+ Connect your AI assistant:
4124
+ ```json
4125
+ { "url": "https://api.webpeel.dev/v2/mcp" }
4126
+ ```
4127
+
4128
+ See `POST /mcp` for full documentation and available tools.
4129
+ tags: [MCP]
4130
+ requestBody:
4131
+ required: true
4132
+ content:
4133
+ application/json:
4134
+ schema:
4135
+ type: object
4136
+ description: MCP JSON-RPC 2.0 request.
4137
+ required: [jsonrpc, method]
4138
+ properties:
4139
+ jsonrpc:
4140
+ type: string
4141
+ const: "2.0"
4142
+ method:
4143
+ type: string
4144
+ params:
4145
+ type: object
4146
+ id:
4147
+ oneOf:
4148
+ - type: string
4149
+ - type: integer
4150
+ - type: "null"
4151
+ responses:
4152
+ "200":
4153
+ description: MCP JSON-RPC response.
4154
+ content:
4155
+ application/json:
4156
+ schema:
4157
+ type: object
4158
+ "401":
4159
+ description: Authentication required.
4160
+ content:
4161
+ application/json:
4162
+ schema:
4163
+ type: object
4164
+ "405":
4165
+ description: Method not allowed.
4166
+ content:
4167
+ application/json:
4168
+ schema:
4169
+ type: object
4170
+
4171
+ get:
4172
+ operationId: mcpGetV2NotAllowed
4173
+ summary: MCP v2 endpoint (GET — not allowed)
4174
+ description: Returns 405. Use POST to send MCP JSON-RPC messages.
4175
+ tags: [MCP]
4176
+ responses:
4177
+ "405":
4178
+ description: Method not allowed.
4179
+ content:
4180
+ application/json:
4181
+ schema:
4182
+ type: object