wsper-js 0.1.1 → 0.1.2-wc1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +295 -5
- package/dist/index.d.ts +2530 -62
- package/dist/index.js +1 -1
- package/package.json +3 -2
package/README.md
CHANGED
|
@@ -8,7 +8,7 @@
|
|
|
8
8
|
[](https://www.npmjs.com/package/wsper-js)
|
|
9
9
|

|
|
10
10
|

|
|
11
|
-

|
|
12
12
|

|
|
13
13
|
</div>
|
|
14
14
|
|
|
@@ -83,7 +83,7 @@ const video = await wsper.youtube.getVideo("dQw4w9WgXcQ");
|
|
|
83
83
|
console.log(track.ok, video.ok);
|
|
84
84
|
```
|
|
85
85
|
|
|
86
|
-
`WsperScraper` currently exposes `spotify`, `twitter`, `threads`, `instagram`, `pinterest`, and `
|
|
86
|
+
`WsperScraper` currently exposes `spotify`, `twitter`, `threads`, `instagram`, `pinterest`, `youtube`, and `cai`. Other scrapers are exported as named classes.
|
|
87
87
|
|
|
88
88
|
## Response Shape
|
|
89
89
|
|
|
@@ -126,6 +126,37 @@ if (response.ok && response.data !== null) {
|
|
|
126
126
|
|
|
127
127
|
The examples below use public exports from `src/index.ts` and representative response fields from implementation and tests.
|
|
128
128
|
|
|
129
|
+
### Public-Link Examples
|
|
130
|
+
|
|
131
|
+
Some scrapers need live public targets. The bundled examples avoid dummy URLs:
|
|
132
|
+
|
|
133
|
+
```bash
|
|
134
|
+
npx tsx examples/gdrive.example.ts
|
|
135
|
+
npx tsx examples/foreign-news.example.ts
|
|
136
|
+
npx tsx examples/pubgmobile.example.ts
|
|
137
|
+
npx tsx examples/soundcloud.example.ts
|
|
138
|
+
npx tsx examples/usgs-earthquake.example.ts all_day
|
|
139
|
+
npx tsx examples/open-meteo.example.ts -6.2 106.8
|
|
140
|
+
npx tsx examples/restcountries.example.ts Indonesia
|
|
141
|
+
npx tsx examples/hacker-news.example.ts 10
|
|
142
|
+
npx tsx examples/open-library.example.ts dune
|
|
143
|
+
npx tsx examples/gutendex.example.ts alice
|
|
144
|
+
npx tsx examples/world-bank.example.ts IDN SP.POP.TOTL
|
|
145
|
+
npx tsx examples/ecb.example.ts 100 USD
|
|
146
|
+
npx tsx examples/crossref.example.ts "open science"
|
|
147
|
+
npx tsx examples/openalex.example.ts "open science"
|
|
148
|
+
npx tsx examples/ror.example.ts "University of Indonesia"
|
|
149
|
+
npx tsx examples/clinical-trials.example.ts diabetes
|
|
150
|
+
npx tsx examples/pypi.example.ts requests
|
|
151
|
+
npx tsx examples/packagist.example.ts monolog/monolog
|
|
152
|
+
npx tsx examples/osv.example.ts PyPI requests
|
|
153
|
+
npx tsx examples/binance.example.ts BTCUSDT
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
`GDriveScraper` defaults to a verified public Google Drive PDF in the example. `ForeignNewsScraper` uses the BBC World RSS feed. `PubgMobileScraper` uses the official server-rendered announcement list because the public `news.shtml` page is a client-rendered shell. Scrapers that commonly require a valid session or cookie, such as TeraBox, Pixiv, and Xiaohongshu, require an explicit URL/ID in their examples instead of using placeholder links.
|
|
157
|
+
|
|
158
|
+
When a live endpoint returns HTTP 200 but no parseable data, these scrapers return `ok: false` with a parser error code instead of returning `ok: true` with empty data.
|
|
159
|
+
|
|
129
160
|
### LyricsScraper
|
|
130
161
|
|
|
131
162
|
Source: LRCLIB JSON API at `https://lrclib.net/api/search`.
|
|
@@ -424,54 +455,225 @@ Representative output:
|
|
|
424
455
|
}
|
|
425
456
|
```
|
|
426
457
|
|
|
458
|
+
### CaiScraper
|
|
459
|
+
|
|
460
|
+
Source: Character.AI WebSocket & API integrations.
|
|
461
|
+
|
|
462
|
+
```ts
|
|
463
|
+
import { CaiScraper } from "wsper-js";
|
|
464
|
+
|
|
465
|
+
const scraper = new CaiScraper({
|
|
466
|
+
credentials: {
|
|
467
|
+
bearerToken: "YOUR_CHARACTER_AI_TOKEN",
|
|
468
|
+
},
|
|
469
|
+
});
|
|
470
|
+
|
|
471
|
+
// 1. Search for a character
|
|
472
|
+
const search = await scraper.searchCharacters("Mario");
|
|
473
|
+
console.log(search.data?.[0]);
|
|
474
|
+
|
|
475
|
+
// 2. Chat with a character (optionally using a custom voice ID)
|
|
476
|
+
const chat = await scraper.chat({
|
|
477
|
+
characterId: "rGKdvZewGUZEJFEQPEBMS5JLQOTOrxi-8ByLFsGmgQM",
|
|
478
|
+
message: "Hello Mario!",
|
|
479
|
+
voiceId: "4fdd6bc1-c659-4587-b462-53f569b39078",
|
|
480
|
+
});
|
|
481
|
+
console.log(chat.data?.text);
|
|
482
|
+
console.log(chat.data?.audioUrl); // Generated TTS audio URL
|
|
483
|
+
|
|
484
|
+
// Chat sessions stay connected per character and are reused by later chat()
|
|
485
|
+
// calls. Disconnect them when your app is done with the character.
|
|
486
|
+
await scraper.disconnectCharacterSession("rGKdvZewGUZEJFEQPEBMS5JLQOTOrxi-8ByLFsGmgQM");
|
|
487
|
+
// Or:
|
|
488
|
+
await scraper.disconnectAllCharacterSessions();
|
|
489
|
+
```
|
|
490
|
+
|
|
491
|
+
Representative output:
|
|
492
|
+
|
|
493
|
+
```json
|
|
494
|
+
{
|
|
495
|
+
"text": "It's-a me, Mario!",
|
|
496
|
+
"audioUrl": "https://storage.googleapis.com/.../voice.mp3",
|
|
497
|
+
"raw": { ... }
|
|
498
|
+
}
|
|
499
|
+
```
|
|
500
|
+
|
|
427
501
|
## Available Scrapers
|
|
428
502
|
|
|
503
|
+
This table is audited against the public scraper exports in `src/scrapers/index.ts`.
|
|
504
|
+
|
|
429
505
|
| Scraper | Purpose | Source/API | Auth/Cookie required? | Example file | Notes |
|
|
430
506
|
| --- | --- | --- | --- | --- | --- |
|
|
507
|
+
| `AioScraper` | Resolve public media download options | `allinonedownloader.com` | No | `examples/aio.example.ts` | `download(url)`; use only authorized public media URLs |
|
|
431
508
|
| `AlkitabScraper` | Bible verse search | `alkitab.me` | No | `examples/alkitab.example.ts` | `search(query)` |
|
|
432
509
|
| `AnimeQuoteScraper` | Random anime quote | `otakotaku.com` | No | `examples/anime-quote.example.ts` | `getRandom()` |
|
|
433
510
|
| `AnimeRandomScraper` | Random anime character image | GitHub raw anime dataset | No | `examples/anime-random.example.ts` | `getImage(character)`, `random()` |
|
|
511
|
+
| `ArenaAiScraper` | AI model leaderboard categories and rankings | `api.wulong.dev/arena-ai-leaderboards/v1` | No | `examples/arena-ai.example.ts` | `getCategories`, `getLeaderboard` |
|
|
434
512
|
| `BiliBiliScraper` | BiliBili search and video info | `api.bilibili.com` | Optional cookie | `examples/bilibili.example.ts` | Cookie may unlock authenticated stream access |
|
|
513
|
+
| `BinanceScraper` | Binance public market data | Binance Spot market data API | No | `examples/binance.example.ts` | Market data only: ticker, average price, klines, order book; no trading/account endpoints |
|
|
514
|
+
| `BimasIslamScraper` | Indonesian Kemenag prayer time regions and schedules | `bimasislam.kemenag.go.id` | No | `examples/bimas-islam.example.ts` | `getProvinces`, `getCities`, `getPrayerTimes` |
|
|
435
515
|
| `BMKGScraper` | Indonesian earthquake and weather feeds | `data.bmkg.go.id`, `nowcasting.bmkg.go.id` | No | `examples/bmkg.example.ts` | Autogempa, gempa dirasakan, nowcasting, forecast |
|
|
516
|
+
| `CaiScraper` | Character.AI search, chat, and voice details | `neo.character.ai` API & WebSockets | Yes (Token required) | `examples/cai.example.ts` | `searchCharacters`, `chat`, `getVoice` |
|
|
436
517
|
| `CapCutScraper` | Resolve CapCut template video URL | `capdownloader.com/wp-json/aio-dl/video-data/` | No | `examples/capcut.example.ts` | Mocked in example runner |
|
|
518
|
+
| `ClinicalTrialsScraper` | ClinicalTrials.gov public study search and detail lookup | ClinicalTrials.gov API v2 | No | `examples/clinical-trials.example.ts` | `searchStudies`, `getStudy`; public registry metadata only |
|
|
519
|
+
| `CodashopScraper` | Game nickname verification | `order-sg.codashop.com` | No | `examples/codashop.example.ts` | Supported game names are validated in the scraper |
|
|
520
|
+
| `CrossrefScraper` | Scholarly work and journal metadata | Crossref REST API | No | `examples/crossref.example.ts` | `searchWorks`, `getWorkByDoi`, `searchJournals`; optional polite `mailto` option |
|
|
437
521
|
| `CuacaScraper` | Indonesian weather by location/coordinate | BMKG weather APIs | Optional API key for warnings | `examples/cuaca.example.ts` | Reads optional `BMKG_WARNING_API_KEY` in example |
|
|
522
|
+
| `DetikNewsScraper` | Detik news search | `detik.com` search HTML | No | `examples/detik-news.example.ts` | `search(query, resultType)` |
|
|
523
|
+
| `DonghubScraper` | Drama search and detail pages | `donghub.vip` | No | `examples/donghub.example.ts` | `search`, `getDetail` |
|
|
524
|
+
| `DownrScraper` | Resolve public video download options | `downr.org` Netlify functions | No | `examples/downr.example.ts` | `getVideo(url)`; use only authorized public media URLs |
|
|
438
525
|
| `DrakorScraper` | Korean drama search/list/detail | `drakorkita30.kita.baby` | No | `examples/drakor.example.ts` | `search`, `detail`, `ongoing`, `getAll` |
|
|
439
526
|
| `DramaboxScraper` | Dramabox search | `dramabox.com` | No | `examples/dramabox.example.ts` | `search(query)` |
|
|
527
|
+
| `EcbScraper` | ECB euro foreign exchange reference rates | ECB eurofxref XML feed | No | `examples/ecb.example.ts` | `getDailyReferenceRates`, `convertFromEuro`; no investment advice |
|
|
528
|
+
| `FacebookScraper` | Facebook public profile/post metadata and image download | `facebook.com`, `mbasic.facebook.com` | No credential option in current scraper | `examples/facebook.example.ts` | Public pages only; no session extraction |
|
|
440
529
|
| `FaceswapScraper` | Face-swap image processing | `api.lovefaceswap.com` | No | `examples/faceswap.example.ts` | Mocked in example runner |
|
|
530
|
+
| `FaceswapV2Scraper` | Face-swap image processing via URL inputs | `supawork.ai` headshot API | No | `examples/faceswap-v2.example.ts` | `swap(targetImageUrl, targetFaceUrl)` |
|
|
531
|
+
| `ForeignNewsScraper` | RSS news feeds | BBC, CNBC Indonesia, CNN Indonesia, Antara, Republika | No | `examples/foreign-news.example.ts` | `getBbcNews`, `getCnbcNews`, `getCnnNews`, `getAntaraNews`, `getRepublikaNews` |
|
|
532
|
+
| `GDriveScraper` | Public Google Drive file metadata and direct download URL | `drive.google.com/uc` | No for public files | `examples/gdrive.example.ts` | `getFileInfo(fileIdOrUrl)` |
|
|
533
|
+
| `GenshinImpactScraper` | Genshin official manga and Hoyowiki entries | `genshin.hoyoverse.com`, `sg-public-api.hoyolab.com` | No | `examples/genshinimpact.example.ts` | `getMangaChapters`, `getWikiCategories`, `getWikiEntries` |
|
|
534
|
+
| `GoqrScraper` | QR code image generation | `api.qrserver.com` | No | `examples/goqr.example.ts` | Returns image `Buffer` |
|
|
535
|
+
| `GutendexScraper` | Project Gutenberg book metadata search | Gutendex JSON API | No | `examples/gutendex.example.ts` | `searchBooks`, `getBook`; metadata only, no bulk book downloads |
|
|
536
|
+
| `HackerNewsScraper` | Hacker News stories, items, and users | Hacker News Firebase API | No | `examples/hacker-news.example.ts` | `getTopStories`, `getNewStories`, `getBestStories`, `getItem`, `getUser` |
|
|
441
537
|
| `HokInfoScraper` | Honor of Kings character info | Fandom MediaWiki parse API | No | `examples/hok-info.example.ts` | Uses `api.php?action=parse` |
|
|
442
538
|
| `HtmlToJpgScraper` | HTML file to JPG conversion | `api.freeconvert.com` | No credential in code | `examples/html-to-jpg.example.ts` | File-based conversion; skipped in runner without fixture |
|
|
443
539
|
| `IkiruMangaScraper` | Manga search | `02.ikiru.wtf` | No | `examples/ikiru-manga.example.ts` | Mock server fallback available |
|
|
444
540
|
| `ImageScraper` | Safebooru image search | `safebooru.org` | No | `examples/image.example.ts` | Current site type: `safebooru` |
|
|
541
|
+
| `ImgflipScraper` | Meme templates and caption generation | `api.imgflip.com` | Credentials required for `captionMeme` only | `examples/imgflip.example.ts` | `getMemes`, `captionMeme` |
|
|
445
542
|
| `ImgUpscalerScraper` | Image upscaling | `get1.imglarger.com` | No | `examples/img-upscaler.example.ts` | Mocked in example runner |
|
|
446
543
|
| `InstagramScraper` | Profile, feed, post, download | `instagram.com` web/API endpoints | Internal defaults; custom cookie supported | `examples/instagram.example.ts` | Use only legitimate session cookies |
|
|
544
|
+
| `KemendagScraper` | Indonesian basic goods dataset discovery | `data.go.id` CKAN API | No | `examples/kemendag.example.ts` | `getBapokDatasets()` |
|
|
447
545
|
| `KomikindoScraper` | Manga search/detail | `komikindo.ch` | No | `examples/komikindo.example.ts` | `search`, `getDetail` |
|
|
546
|
+
| `KompasNewsScraper` | Kompas news search | `search.kompas.com` | No | `examples/kompas-news.example.ts` | `search(query)` |
|
|
547
|
+
| `LikeeScraper` | Likee public video metadata | `likeedownloader.com/process` | No | `examples/likee.example.ts` | `getInfo(url)` |
|
|
448
548
|
| `LyricsScraper` | Lyrics search | `lrclib.net` JSON API | No | `examples/lyrics.example.ts` | Replaced blocked HTML scraping with API integration |
|
|
449
|
-
| `MConverterScraper` | File conversion helpers | `mconverter.eu` | No | `examples/mconverter.example.ts` | `getTargets`, `convert`, `convertBuffer` |
|
|
450
549
|
| `McAddonScraper` | Minecraft addon search/detail | `mmcreviews.com` | No | `examples/mcaddon.example.ts` | `search`, `getDetail`, `getAddon` |
|
|
550
|
+
| `MConverterScraper` | File conversion helpers | `mconverter.eu` | No | `examples/mconverter.example.ts` | `getTargets`, `convert`, `convertBuffer` |
|
|
451
551
|
| `MediafireScraper` | Resolve Mediafire download link | Mediafire public HTML page | No | `examples/mediafire.example.ts` | Default example uses active 10MB test file |
|
|
452
552
|
| `ModAndroidScraper` | Android APK/mod search aggregations | `an1.com`, `modyolo.com`, `aptoide.com`, `uptodown.com` | No | `examples/mod-android.example.ts` | `android1`, `modyolo`, `aptoide`, `uptodown`, `searchAll` |
|
|
553
|
+
| `MyAnimeListScraper` | Anime search and top anime via Jikan | `api.jikan.moe/v4` | No | `examples/myanimelist.example.ts` | `search`, `getTopAnime` |
|
|
554
|
+
| `NanoBananaScraper` | AI image edit workflow | `app.live3d.io` | No | `examples/nanobanana.example.ts` | Buffer upload, job creation, and polling |
|
|
453
555
|
| `OcrScraper` | OCR image scan | `newocr.com` | No | `examples/ocr.example.ts` | File/buffer based |
|
|
556
|
+
| `OpenAlexScraper` | Scholarly works, authors, and institutions | OpenAlex API | No | `examples/openalex.example.ts` | `searchWorks`, `getWork`, `searchAuthors`, `searchInstitutions`; optional polite `mailto` option |
|
|
557
|
+
| `OpenMeteoScraper` | Weather forecast and current weather by coordinate | Open-Meteo forecast API | No | `examples/open-meteo.example.ts` | `getForecast`, `getCurrentWeather`; validates coordinates and forecast bounds |
|
|
558
|
+
| `OpenLibraryScraper` | Open Library book, work, author, and ISBN lookup | Open Library JSON APIs | No | `examples/open-library.example.ts` | `searchBooks`, `getWork`, `getAuthor`, `getByIsbn` |
|
|
559
|
+
| `OsvScraper` | Open Source Vulnerabilities package and vulnerability lookup | OSV API v1 | No | `examples/osv.example.ts` | `queryPackage`, `getVulnerability`; public advisory metadata |
|
|
560
|
+
| `PackagistScraper` | PHP Composer package metadata and search | Packagist search API and Composer metadata API | No | `examples/packagist.example.ts` | `searchPackages`, `getPackage` |
|
|
454
561
|
| `PhotoAiScraper` | Photo AI upload/status | `photoai.imglarger.com` | No | `examples/photo-ai.example.ts` | Mocked in example runner |
|
|
455
562
|
| `PinterestScraper` | Pin search, detail, download | `pinterest.com` | Internal defaults; custom cookie supported | `examples/pinterest.example.ts` | Supports `credentials` option |
|
|
563
|
+
| `PixivScraper` | Pixiv artwork metadata | `pixiv.net/ajax/illust` | Internal defaults; custom credentials supported | `examples/pixiv.example.ts` | `getIllust(illustId)` |
|
|
456
564
|
| `PlayStoreScraper` | Google Play app search | `play.google.com` | No | `examples/playstore.example.ts` | `search(query, limit)` |
|
|
565
|
+
| `PubgMobileScraper` | PUBG Mobile announcement list | `pubgmobile.com` server-rendered news path | No | `examples/pubgmobile.example.ts` | `getNews()` |
|
|
566
|
+
| `PypiScraper` | Python package and release metadata | PyPI JSON API | No | `examples/pypi.example.ts` | `getPackage`, `getRelease` |
|
|
567
|
+
| `RemovebgScraper` | Background removal | Official `remove.bg` API | Yes (API key) | `examples/removebg.example.ts` | Prefer constructor credentials with `apiKey` |
|
|
457
568
|
| `ResepScraper` | Recipe search | `cookpad.com` | No | `examples/resep.example.ts` | Returns recipe items |
|
|
569
|
+
| `RestCountriesScraper` | Country lookup by name/code and bounded country lists | REST Countries v3.1 API | No | `examples/restcountries.example.ts` | `getAll(fields)` requires explicit fields to keep payloads bounded |
|
|
570
|
+
| `RorScraper` | Research organization registry search and lookup | ROR REST API | No | `examples/ror.example.ts` | `searchOrganizations`, `getOrganization` |
|
|
458
571
|
| `SakuraNovelScraper` | Novel search/detail/chapter | `sakuranovel.id` | No | `examples/sakura-novel.example.ts` | Mock server fallback available |
|
|
572
|
+
| `SfileScraper` | Sfile public file metadata | `sfile.mobi`, `sfile.co` | No | Not yet available | `getMetadata(url)` |
|
|
573
|
+
| `SoundcloudScraper` | SoundCloud track metadata | `soundcloud.com`, `api-v2.soundcloud.com` | No | `examples/soundcloud.example.ts` | Extracts or falls back to a client ID |
|
|
459
574
|
| `SpotifyScraper` | Spotify track, album, playlist, search, downloads | Spotify Web API and Accounts API | Internal defaults; custom client credentials supported | `examples/spotify.example.ts` | Partial custom credentials are rejected |
|
|
460
575
|
| `StalkScraper` | npm package metadata lookup | `registry.npmjs.org` | No | `examples/stalk.example.ts` | Default example query is `axios` |
|
|
576
|
+
| `TeraboxScraper` | TeraBox public share listing | `terabox.com/share/list` | Internal defaults; custom credentials supported | `examples/terabox.example.ts` | `getShareList(url)` |
|
|
577
|
+
| `TextReplaceScraper` | Replace text in images | `imgupscaler.ai`, `magiceraser.org` APIs | No | `examples/text-replace.example.ts` | Buffer upload, job creation, and polling |
|
|
461
578
|
| `ThreadsScraper` | Threads profile, post, search, download | `threads.net` web/API endpoints | Internal defaults; custom cookie supported | `examples/threads.example.ts` | Use only legitimate session cookies |
|
|
579
|
+
| `TiktokScraper` | TikTok video, user, post, and search metadata | `tiktok.com` web/API endpoints | Internal defaults; custom credentials supported | `examples/tiktok.example.ts` | `getVideo`, `getUser`, `getUserPosts`, `searchVideos`, `searchUsers` |
|
|
462
580
|
| `TopAnimeScraper` | MyAnimeList top anime list | `myanimelist.net` | No | `examples/top-anime.example.ts` | `getTopAnime(limit)` |
|
|
463
581
|
| `TwitterScraper` | Tweet, profile, search, timeline, downloads | `x.com/i/api/graphql` | Cookie/CSRF commonly required | `examples/twitter.example.ts` | Supports `credentials` option |
|
|
464
582
|
| `UguuScraper` | Temporary file upload | `uguu.se/upload` | No | `examples/uguu.example.ts` | `upload(buffer, filename)` |
|
|
583
|
+
| `UnblurVideoScraper` | Video enhancement workflow | `api.unblurimage.ai` | No | `examples/unblur-video.example.ts` | Buffer upload, OSS PUT, job creation, and polling |
|
|
584
|
+
| `UnsplashScraper` | Unsplash photo search | `unsplash.com/napi/search/photos` | No | `examples/unsplash.example.ts` | `searchPhotos(query, page, perPage)` |
|
|
585
|
+
| `UnwatermarkScraper` | Image watermark/text/logo restoration workflow | `api.unwatermark.ai` | No | `examples/unwatermark.example.ts` | Buffer upload, job creation, and polling |
|
|
465
586
|
| `UpscalerScraper` | Image enhancement | `aienhancer.ai` | No | `examples/upscaler.example.ts` | Rejects remote URL string input |
|
|
587
|
+
| `UpscalerV3Scraper` | Image upscale, background removal, style, and deblur actions | `imageupscaler.com/wp-admin/admin-ajax.php` | No | `examples/upscaler-v3.example.ts` | Requires explicit `nonce` and `pid` parameters |
|
|
588
|
+
| `UsgsEarthquakeScraper` | Earthquake feeds and bounded event queries | USGS earthquake GeoJSON feeds and event API | No | `examples/usgs-earthquake.example.ts` | `getSummary`, `queryEvents`; validates feed/query bounds |
|
|
466
589
|
| `VideyScraper` | Video upload | `videy.co/api/upload` | No | `examples/videy.example.ts` | `upload`, `uploadBuffer` |
|
|
467
590
|
| `WallpaperScraper` | Wallpaper search | `wallhaven.cc/api/v1/search` | No | `examples/wallpaper.example.ts` | Replaced blocked HTML scraping with API integration |
|
|
468
591
|
| `Webp2Mp4Scraper` | WebP to MP4/PNG conversion | `ezgif.com` | No | `examples/webp2mp4.example.ts` | `toMp4`, `toPng` |
|
|
592
|
+
| `WikipediaScraper` | Wikipedia search and page summary | MediaWiki and REST summary APIs | No | `examples/wikipedia.example.ts` | `search`, `getSummary` |
|
|
593
|
+
| `WorldBankScraper` | World Bank country and indicator metadata/data | World Bank API v2 | No | `examples/world-bank.example.ts` | `searchCountries`, `getCountry`, `searchIndicators`, `getIndicator` |
|
|
469
594
|
| `WwCharScraper` | Wuthering Waves character info | Fandom MediaWiki parse API | No | `examples/ww-char.example.ts` | Uses `api.php?action=parse` |
|
|
595
|
+
| `XiaohongshuScraper` | Xiaohongshu note metadata | Xiaohongshu web endpoints | Internal defaults; custom credentials supported | `examples/xiaohongshu.example.ts` | `getNote(url)` |
|
|
470
596
|
| `YouTubeScraper` | YouTube metadata, search, playlist, channel, downloads | `yt-dlp`, `play-dl`, YouTube pages | No cookie option exposed in current scraper options | `examples/youtube.example.ts` | Media download features need external tools |
|
|
471
597
|
|
|
598
|
+
## Scraper Response Reference
|
|
599
|
+
|
|
600
|
+
Every scraper method still uses the standard `WsperResponse<T>` envelope. The table below documents `response.data` for scraper exports that were missing from the previous `Available Scrapers` table or only mentioned in prose.
|
|
601
|
+
|
|
602
|
+
| Scraper | Primary methods | Success `response.data` |
|
|
603
|
+
| --- | --- | --- |
|
|
604
|
+
| `AioScraper` | `download(url)` | `AioResult`: optional `title`, `thumbnail`, `duration`, `source`, and `medias[]` entries with `url`, `quality`, `type`, `ext`, optional `size`. |
|
|
605
|
+
| `ArenaAiScraper` | `getCategories()`, `getLeaderboard(category)` | Categories return `{ date, fetched_at, leaderboards, errors }`; leaderboard returns `{ meta, models }` where each model has rank, model, vendor, license, score, confidence interval, and votes. |
|
|
606
|
+
| `BimasIslamScraper` | `getProvinces()`, `getCities(provinceId)`, `getPrayerTimes(provinceId, cityId, month, year)` | Region methods return `{ id, name }[]`; prayer times return daily entries with `tanggal`, `imsak`, `subuh`, `terbit`, `dhuha`, `dzuhur`, `ashar`, `maghrib`, and `isya`. |
|
|
607
|
+
| `BinanceScraper` | `getTicker24hr(symbol)`, `getAveragePrice(symbol)`, `getKlines(symbol, interval, options?)`, `getOrderBook(symbol, limit?)` | Public market data only. Ticker/average/order book/klines preserve price and quantity strings from Binance and include source timestamps/IDs where available. No trading or account endpoints are implemented. |
|
|
608
|
+
| `ClinicalTrialsScraper` | `searchStudies(query, options?)`, `getStudy(nctId)` | Search returns `{ totalCount, nextPageToken, studies }`; each study includes NCT ID, title, status, date fields, conditions, interventions, phases, study type, and lead sponsor metadata. |
|
|
609
|
+
| `CodashopScraper` | `checkNickname(game, id, zone?)` | `{ success, game, name, userId, zoneId? }`. |
|
|
610
|
+
| `CrossrefScraper` | `searchWorks(query, options?)`, `getWorkByDoi(doi)`, `searchJournals(query, options?)` | Works include DOI, title/subtitle arrays, publisher, type, URL, issued date, subjects, authors, reference count, and score. Journals include title, publisher, ISSNs, DOI count, and subjects. |
|
|
611
|
+
| `DetikNewsScraper` | `search(query, resultType?)` | `DetikNewsItem[]` with `title`, `url`, `imageUrl`, `category`, `description`, and `date`. |
|
|
612
|
+
| `DonghubScraper` | `search(query)`, `getDetail(url)` | Search returns `{ query, results }`; detail returns drama metadata including `title`, `url`, `image`, `genres`, `synopsis`, and `episodeList`. |
|
|
613
|
+
| `DownrScraper` | `getVideo(url)` | `DownrResult`: optional `url`, `title`, `author`, `duration`, `thumbnail`, and `medias[]` entries with URL, quality, size, extension, and media type. |
|
|
614
|
+
| `EcbScraper` | `getDailyReferenceRates()`, `convertFromEuro(amount, currency)` | Daily rates return `{ date, base: "EUR", rates }`; conversion returns amount, target currency, rate, converted amount, and rate date from the ECB eurofxref XML feed. |
|
|
615
|
+
| `FacebookScraper` | `getUserProfile(usernameOrId)`, `getPostDetail(url)`, `downloadImage(url)` | Profile data, post detail data, or `{ url, buffer?, title? }` for downloaded image data. |
|
|
616
|
+
| `FaceswapV2Scraper` | `swap(targetImageUrl, targetFaceUrl)` | `{ url, requestId? }`. |
|
|
617
|
+
| `ForeignNewsScraper` | `getBbcNews(region?)`, `getCnbcNews()`, `getCnnNews()`, `getAntaraNews()`, `getRepublikaNews()` | `ForeignNewsItem[]` with `title`, `description`, `link`, `pubDate`, and `guid`. |
|
|
618
|
+
| `GDriveScraper` | `getFileInfo(fileIdOrUrl)` | `{ filename, size, mimeType, directDownloadUrl, requiresConfirm, confirmToken }`. |
|
|
619
|
+
| `GenshinImpactScraper` | `getMangaChapters(lang?)`, `getWikiCategories()`, `getWikiEntries(menuId, pageNum?, pageSize?)` | Manga chapters, wiki categories, or wiki entries with IDs, names, icons, rarity, weapon type, and element where available. |
|
|
620
|
+
| `GoqrScraper` | `generateQrCode(data, size?)` | `Buffer` containing the generated QR code image bytes. |
|
|
621
|
+
| `GutendexScraper` | `searchBooks(query, options?)`, `getBook(id)` | Search returns `{ count, next, previous, results }`; each book includes ID, title, authors/translators, subjects, bookshelves, languages, copyright flag, media type, format URLs, and download count. |
|
|
622
|
+
| `HackerNewsScraper` | `getTopStories(limit?)`, `getNewStories(limit?)`, `getBestStories(limit?)`, `getItem(id)`, `getUser(username)` | Story list methods return `HackerNewsItem[]` with IDs, titles, authors, scores, URLs, comment IDs, timestamps, and descendant counts. Item/user methods return a single item or user object, or `null` when the Firebase API returns no object. |
|
|
623
|
+
| `ImgflipScraper` | `getMemes()`, `captionMeme(options)` | Meme templates return `ImgflipMemeTemplate[]`; captioning returns `{ url, page_url }`. |
|
|
624
|
+
| `KemendagScraper` | `getBapokDatasets()` | `KemendagBapokItem[]` with title, resource ID, format, download URL, and last-modified timestamp. |
|
|
625
|
+
| `KompasNewsScraper` | `search(query)` | `KompasNewsItem[]` with `title`, `url`, `imageUrl`, `category`, `date`, and `description`. |
|
|
626
|
+
| `LikeeScraper` | `getInfo(url)` | `{ pageUrl, thumbnail?, title?, playUrl? }`. |
|
|
627
|
+
| `MyAnimeListScraper` | `search(query, limit?)`, `getTopAnime()` | `MalAnimeItem[]` with MAL ID, titles, type, episode count, status, score, rating, synopsis, image URL, and genres. |
|
|
628
|
+
| `NanoBananaScraper` | `edit(buffer, prompt)` | `{ resultUrl }`. |
|
|
629
|
+
| `OpenAlexScraper` | `searchWorks(query, options?)`, `getWork(idOrDoi)`, `searchAuthors(query, options?)`, `searchInstitutions(query, options?)` | Search methods return `{ meta, results }`. Works include IDs, DOI, title/display name, publication year, type, citation count, and open access status. Authors and institutions include counts and affiliation/country metadata. |
|
|
630
|
+
| `OpenMeteoScraper` | `getForecast(latitude, longitude, options?)`, `getCurrentWeather(latitude, longitude)` | `OpenMeteoForecast` with coordinates, timezone, elevation, `currentWeather`, and optional `hourly`/`daily` series records. Invalid coordinates and forecast ranges return validation responses. |
|
|
631
|
+
| `OpenLibraryScraper` | `searchBooks(query, options?)`, `getWork(workKey)`, `getAuthor(authorKey)`, `getByIsbn(isbn)` | Search returns Open Library docs with title, author names, first publish year, ISBNs, language, subjects, and cover ID. Work, author, and ISBN methods return normalized metadata for the requested Open Library key. |
|
|
632
|
+
| `OsvScraper` | `queryPackage({ ecosystem, name, version? })`, `getVulnerability(id)` | Query returns `{ vulns }`; vulnerabilities include OSV ID, summary/details, aliases, published/modified timestamps, database-specific metadata, and affected package/range/version entries. |
|
|
633
|
+
| `PackagistScraper` | `searchPackages(query, options?)`, `getPackage(name)` | Search returns package summary rows with name, description, repository, downloads, and favorites. Package lookup returns normalized Composer versions with license, authors, requirements, source URL, and dist URL. |
|
|
634
|
+
| `PixivScraper` | `getIllust(illustId)` | `PixivArtwork` with artwork ID, title, description, type, created date, image URLs, tags, user ID, and user name. |
|
|
635
|
+
| `PubgMobileScraper` | `getNews()` | `PubgMobileNewsItem[]` with `title`, `date`, `url`, and `summary`. |
|
|
636
|
+
| `PypiScraper` | `getPackage(name)`, `getRelease(name, version)` | PyPI package metadata with normalized `info`, release file maps, and distribution URLs. File records include filename, package type, Python version, URL, size, upload time, and digests. |
|
|
637
|
+
| `RemovebgScraper` | `remove(buffer)` | `{ buffer }` where `buffer` contains the background-removed image bytes. |
|
|
638
|
+
| `RestCountriesScraper` | `getByName(name, options?)`, `getByCode(code, options?)`, `getAll(fields)` | Country records with names, ISO codes, capital, region, population, flags, timezones, coordinates, and optional currencies, languages, maps, and status fields. `getAll` requires explicit fields to avoid unbounded responses. |
|
|
639
|
+
| `RorScraper` | `searchOrganizations(query, options?)`, `getOrganization(id)` | ROR organization metadata with ID, name, status, types, country, links, aliases, and acronyms. `getOrganization` accepts a compact ROR ID or `https://ror.org/...` URL. |
|
|
640
|
+
| `SfileScraper` | `getMetadata(url)` | `SfileMetadata` with optional filename, size, author, upload date, download count, plus `pageUrl`. |
|
|
641
|
+
| `SoundcloudScraper` | `getTrack(url)` | `SoundcloudTrack` with track metadata, artwork, duration, genre, user info, stats, and optional stream URL. |
|
|
642
|
+
| `TeraboxScraper` | `getShareList(url)` | `{ shareid, uk, list }`; each file includes `fsId`, filename, size, directory flag, category, path, and optional download URL. |
|
|
643
|
+
| `TextReplaceScraper` | `replace(buffer, originalText, replaceText, fileName?)` | `{ outputUrl }`. |
|
|
644
|
+
| `TiktokScraper` | `getVideo(url)`, `getVideoDirect(url)`, `getUser(username)`, `getUserPosts(username, count?)`, `searchVideos(query, count?)`, `searchUsers(query, count?)` | Video metadata, user metadata, post arrays, search result objects, or user search arrays depending on the method. |
|
|
645
|
+
| `UnblurVideoScraper` | `enhance(buffer, resolution?, fileName?)` | `{ jobId, inputUrl?, outputUrl? }`. |
|
|
646
|
+
| `UnsplashScraper` | `searchPhotos(query, page?, perPage?)` | `UnsplashPhotoItem[]` with IDs, descriptions, dimensions, likes, image URLs, and photographer info. |
|
|
647
|
+
| `UnwatermarkScraper` | `restore(buffer)` | `{ jobId, inputUrl?, outputUrl? }`. |
|
|
648
|
+
| `UpscalerV3Scraper` | `process(buffer, params)` | `{ output }`; `params` must include `functionType`, explicit `nonce`, and explicit `pid`. |
|
|
649
|
+
| `UsgsEarthquakeScraper` | `getSummary(feed?)`, `queryEvents(options?)` | `UsgsEarthquakeSummary` with feed metadata, generated timestamp, count, and normalized event rows including magnitude, place, ISO timestamps, coordinates, depth, significance, alert/status, and source URLs. |
|
|
650
|
+
| `WikipediaScraper` | `search(query, lang?)`, `getSummary(title, lang?)` | Search returns result rows with title, page ID, snippet, and timestamp; summary returns title, extracts, description, optional thumbnail URL, and page URL. |
|
|
651
|
+
| `WorldBankScraper` | `searchCountries(query, options?)`, `getCountry(code)`, `searchIndicators(query, options?)`, `getIndicator(country, indicator, options?)` | Country and indicator searches return `{ pagination, items }`; indicator data returns `{ pagination, values }` with country, indicator, date, numeric value, unit, observation status, and decimal metadata. |
|
|
652
|
+
| `XiaohongshuScraper` | `getNote(url)` | `XiaohongshuNote` with title, description, type, timestamp, user, images, and engagement stats. |
|
|
653
|
+
|
|
472
654
|
## What's New
|
|
473
655
|
|
|
474
|
-
The latest
|
|
656
|
+
The latest research implementation passes added sixteen public, no-credential scrapers from the roadmap:
|
|
657
|
+
|
|
658
|
+
- `UsgsEarthquakeScraper` for USGS earthquake GeoJSON feeds and bounded event queries.
|
|
659
|
+
- `OpenMeteoScraper` for Open-Meteo forecast/current weather by coordinate.
|
|
660
|
+
- `RestCountriesScraper` for country lookup by name/code and bounded `getAll(fields)` lists.
|
|
661
|
+
- `HackerNewsScraper` for Hacker News stories, items, and users through the public Firebase API.
|
|
662
|
+
- `OpenLibraryScraper` for Open Library book search, work, author, and ISBN metadata.
|
|
663
|
+
- `GutendexScraper` for Project Gutenberg metadata through the Gutendex JSON API.
|
|
664
|
+
- `WorldBankScraper` for World Bank countries, indicators, and bounded indicator time-series.
|
|
665
|
+
- `EcbScraper` for ECB daily euro reference rates and EUR conversion helpers.
|
|
666
|
+
- `CrossrefScraper` for scholarly work and journal metadata through the Crossref REST API.
|
|
667
|
+
- `OpenAlexScraper` for scholarly works, authors, and institutions.
|
|
668
|
+
- `RorScraper` for research organization registry search and lookup.
|
|
669
|
+
- `ClinicalTrialsScraper` for ClinicalTrials.gov public study search and detail lookup.
|
|
670
|
+
- `PypiScraper` for Python package and release metadata through the PyPI JSON API.
|
|
671
|
+
- `PackagistScraper` for PHP Composer package metadata and search.
|
|
672
|
+
- `OsvScraper` for public OSV package vulnerability queries and vulnerability detail lookup.
|
|
673
|
+
- `BinanceScraper` for Binance public market data only, with no account or trading endpoints.
|
|
674
|
+
- Verification for the latest full pass: `npm run typecheck`, `npm run test`, and `npm run build`; full Vitest suite reported 132 test files passed and 441 tests passed.
|
|
675
|
+
|
|
676
|
+
The previous scraper reliability pass fixed 12 failing scraper functions and their corresponding tests.
|
|
475
677
|
|
|
476
678
|
- `LyricsScraper` now uses LRCLIB's public JSON API instead of a Cloudflare-protected Musixmatch HTML page.
|
|
477
679
|
- `WallpaperScraper` now uses Wallhaven's public search API instead of a Cloudflare-protected Wallpaperflare HTML page.
|
|
@@ -537,6 +739,22 @@ npx tsx examples/wallpaper.example.ts cyberpunk
|
|
|
537
739
|
npx tsx examples/stalk.example.ts axios
|
|
538
740
|
npx tsx examples/mediafire.example.ts "https://www.mediafire.com/file/ipnyzofjcwri357/test-10mb.bin/file"
|
|
539
741
|
npx tsx examples/upscaler.example.ts testassets/photo.jpg
|
|
742
|
+
npx tsx examples/usgs-earthquake.example.ts all_day
|
|
743
|
+
npx tsx examples/open-meteo.example.ts -6.2 106.8
|
|
744
|
+
npx tsx examples/restcountries.example.ts Indonesia
|
|
745
|
+
npx tsx examples/hacker-news.example.ts 10
|
|
746
|
+
npx tsx examples/open-library.example.ts dune
|
|
747
|
+
npx tsx examples/gutendex.example.ts alice
|
|
748
|
+
npx tsx examples/world-bank.example.ts IDN SP.POP.TOTL
|
|
749
|
+
npx tsx examples/ecb.example.ts 100 USD
|
|
750
|
+
npx tsx examples/crossref.example.ts "open science"
|
|
751
|
+
npx tsx examples/openalex.example.ts "open science"
|
|
752
|
+
npx tsx examples/ror.example.ts "University of Indonesia"
|
|
753
|
+
npx tsx examples/clinical-trials.example.ts diabetes
|
|
754
|
+
npx tsx examples/pypi.example.ts requests
|
|
755
|
+
npx tsx examples/packagist.example.ts monolog/monolog
|
|
756
|
+
npx tsx examples/osv.example.ts PyPI requests
|
|
757
|
+
npx tsx examples/binance.example.ts BTCUSDT
|
|
540
758
|
```
|
|
541
759
|
|
|
542
760
|
For scrapers that support cookies or credentials, use only accounts and sessions you are authorized to access. Do not hardcode real cookies, tokens, client secrets, or API keys in source files.
|
|
@@ -571,6 +789,24 @@ npm run build
|
|
|
571
789
|
npx tsx examples/alllexamp.ts
|
|
572
790
|
```
|
|
573
791
|
|
|
792
|
+
### Live/e2e tests and credentials
|
|
793
|
+
|
|
794
|
+
Unit tests run fully offline (mocks + local fixture servers) and never need
|
|
795
|
+
real credentials. Tests that hit live platform APIs are guarded by
|
|
796
|
+
`tests/helpers/credentials.ts` — when the required `WSPER_*` env variables are
|
|
797
|
+
missing they are **skipped with a clear message** instead of failing:
|
|
798
|
+
|
|
799
|
+
```bash
|
|
800
|
+
# everything offline — live tests skip cleanly
|
|
801
|
+
npm run test
|
|
802
|
+
|
|
803
|
+
# include live Spotify tests
|
|
804
|
+
WSPER_SPOTIFY_CLIENT_ID=... WSPER_SPOTIFY_CLIENT_SECRET=... npm run test:spotify
|
|
805
|
+
|
|
806
|
+
# include live Instagram tests
|
|
807
|
+
WSPER_INSTAGRAM_COOKIE=... npm run test:instagram
|
|
808
|
+
```
|
|
809
|
+
|
|
574
810
|
## Project Structure
|
|
575
811
|
|
|
576
812
|
```txt
|
|
@@ -599,6 +835,57 @@ tests/
|
|
|
599
835
|
dist/ Build output only; do not edit manually
|
|
600
836
|
```
|
|
601
837
|
|
|
838
|
+
## Credentials Configuration
|
|
839
|
+
|
|
840
|
+
The library never ships personal credentials. Resolution order per scraper:
|
|
841
|
+
|
|
842
|
+
1. **Constructor injection (recommended)** — pass `options.credentials` when creating a scraper.
|
|
843
|
+
2. **Environment variables** — optional `WSPER_*` fallbacks read lazily by `src/core/credentials.ts`.
|
|
844
|
+
3. **Nothing** — the scraper runs in public mode, or methods that require credentials fail with a clear error code (e.g. `CAI_CREDENTIALS_REQUIRED`, `SPOTIFY_CREDENTIALS_MISSING`).
|
|
845
|
+
|
|
846
|
+
```ts
|
|
847
|
+
const scraper = new InstagramScraper({
|
|
848
|
+
credentials: { cookie: process.env.WSPER_INSTAGRAM_COOKIE },
|
|
849
|
+
queue: {
|
|
850
|
+
concurrency: 1,
|
|
851
|
+
minDelayMs: 1500,
|
|
852
|
+
maxDelayMs: 4000,
|
|
853
|
+
timeoutMs: 30000,
|
|
854
|
+
retries: 2,
|
|
855
|
+
},
|
|
856
|
+
});
|
|
857
|
+
```
|
|
858
|
+
|
|
859
|
+
Supported credential env variables (see `.env.example`, copy to `.env` for examples):
|
|
860
|
+
|
|
861
|
+
| Variable | Platform |
|
|
862
|
+
| --- | --- |
|
|
863
|
+
| `WSPER_SPOTIFY_CLIENT_ID` / `WSPER_SPOTIFY_CLIENT_SECRET` | Spotify Web API |
|
|
864
|
+
| `WSPER_SPOTIFY_CALLBACK_URL` / `WSPER_SPOTIFY_MARKET` | Spotify OAuth / market |
|
|
865
|
+
| `WSPER_INSTAGRAM_COOKIE` | Instagram |
|
|
866
|
+
| `WSPER_THREADS_COOKIE` | Threads |
|
|
867
|
+
| `WSPER_TWITTER_COOKIE` | Twitter/X |
|
|
868
|
+
| `WSPER_PINTEREST_COOKIE` | Pinterest |
|
|
869
|
+
| `WSPER_TIKTOK_COOKIE` | TikTok |
|
|
870
|
+
| `WSPER_FACEBOOK_COOKIE` | Facebook |
|
|
871
|
+
| `WSPER_BILIBILI_COOKIE` | BiliBili |
|
|
872
|
+
| `WSPER_CAI_TOKEN` | Character.AI |
|
|
873
|
+
| `REMOVEBG_API_KEY` | remove.bg |
|
|
874
|
+
|
|
875
|
+
For running the bundled examples with your own account, copy
|
|
876
|
+
`examples/config/credentials.example.ts` to `examples/config/credentials.ts`
|
|
877
|
+
(gitignored) and fill in your values. Examples run in two modes via
|
|
878
|
+
`WSPER_EXAMPLE_MODE`:
|
|
879
|
+
|
|
880
|
+
- `smoke` (default) — fast, few requests, local credentials are **not** loaded; pair with `WSPER_MOCK_BASE_URL` for fully offline runs.
|
|
881
|
+
- `real` — loads `examples/config/credentials.ts` (or `.env`) and uses a polite queue. Use your own cookies/tokens only; you bear the platform ToS and rate-limit risk.
|
|
882
|
+
|
|
883
|
+
> **Migration note:** `src/core/credentials.ts` previously contained baked-in
|
|
884
|
+
> default cookies/tokens. These were removed; the exported `credentials` object
|
|
885
|
+
> now resolves from `WSPER_*` env variables and is empty by default. If you
|
|
886
|
+
> relied on the old defaults, inject credentials via constructor options or env.
|
|
887
|
+
> Rotate any cookies/tokens that were ever committed.
|
|
888
|
+
|
|
602
889
|
## Environment Variables
|
|
603
890
|
|
|
604
891
|
These variables appear in the repository:
|
|
@@ -612,7 +899,7 @@ These variables appear in the repository:
|
|
|
612
899
|
| `BILI_COOKIE` | `examples/bilibili.example.ts` | No | Optional BiliBili cookie for authenticated stream access. |
|
|
613
900
|
| `WSPER_COOKIE` | `tests/core/credentials.test.ts` | No | Test-only variable proving runtime credential resolution does not read env credentials automatically. |
|
|
614
901
|
|
|
615
|
-
Credential configuration is constructor-
|
|
902
|
+
Credential configuration is constructor-first with optional `WSPER_*` env fallbacks (see "Credentials Configuration" above). The library itself never reads `.env` files; only the examples helper (`examples/config/runtime.ts`) parses a local `.env` for convenience.
|
|
616
903
|
|
|
617
904
|
```ts
|
|
618
905
|
import { WsperScraper } from "wsper-js";
|
|
@@ -623,6 +910,9 @@ const wsper = new WsperScraper({
|
|
|
623
910
|
clientSecret: "your-client-secret",
|
|
624
911
|
},
|
|
625
912
|
credentials: {
|
|
913
|
+
cai: {
|
|
914
|
+
bearerToken: "<YOUR_TOKEN_HERE>",
|
|
915
|
+
},
|
|
626
916
|
twitter: {
|
|
627
917
|
cookie: "<YOUR_COOKIE_HERE>",
|
|
628
918
|
csrfToken: "<YOUR_CSRF_TOKEN_HERE>",
|