@ioodev/nodescraper 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,106 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project are documented in this file.
4
+ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
5
+ and this project follows [Semantic Versioning](https://semver.org/).
6
+
7
+ ## [1.1.1] — 2026-06-19
8
+
9
+ ### Changed
10
+
11
+ - **Package renamed from `@riodevnet/nodescraper` to `@ioodev/nodescraper`.**
12
+ npm scopes are tied 1:1 to an account/organization name and cannot be
13
+ renamed in place, so this is a fresh publish under the `@ioodev` scope
14
+ rather than an update to the old package. No code or API changes.
15
+ - Updated all `github.com/riodevnet/...` references (README, `package.json`
16
+ `repository`/`homepage`/`bugs`, default User-Agent string in
17
+ `src/constants.js`, TypeScript header comment) to `github.com/ioodev/...`.
18
+
19
+ ### Compatibility
20
+
21
+ No functional changes. If you depend on `@riodevnet/nodescraper`, see the
22
+ README's "Migrating from `@riodevnet/nodescraper`" section — it's a
23
+ drop-in rename, just change the import/install path to `@ioodev/nodescraper`.
24
+
25
+ ## [1.1.0] — 2026-06-19
26
+
27
+ A bug-fix-and-feature release. Every existing v1.0 method keeps its name
28
+ and return shape — code written against 1.0 keeps working — but several
29
+ returned values are now *correct* where they previously weren't, and a
30
+ number of long-missing capabilities (error visibility, raw-HTML loading,
31
+ JSON-LD, etc.) have been added.
32
+
33
+ ### Fixed
34
+
35
+ - **`keywords()` / `viewport()` returned untrimmed strings.** Splitting
36
+ `"example, domain, test"` on `,` used to yield `["example", " domain",
37
+ " test"]` (note the leading spaces). Entries are now trimmed and empty
38
+ entries are dropped.
39
+ - **`link_details()` returned `[""]` instead of `[]`** for `rel` on links
40
+ with no `rel` attribute, which broke `.includes('nofollow')`-style
41
+ checks on those links. `rel` is now reliably `[]` when absent.
42
+ - **Failed loads were completely silent.** `init()` swallowed every error
43
+ (network failure, timeout, 404/403/500 responses, invalid URLs) and just
44
+ left `soup` as `null`, with no way to find out *why*. `init()` now
45
+ records the failure on `this.error` / `this.statusCode`, exposed via
46
+ `getError()` and `getStatusCode()`, and an explicit `isLoaded()` check.
47
+ - **No default `User-Agent`.** Axios' default UA string causes many real
48
+ sites to return 403 or an empty body. NodeScraper now sends a realistic
49
+ browser-like `User-Agent` by default (overridable via `options.userAgent`
50
+ or `options.headers`).
51
+ - **`filter()` could throw on a malformed selector** (e.g. a typo'd
52
+ attribute/class selector) and crash the caller. It now fails soft and
53
+ returns `null`, consistent with every other getter.
54
+ - **No protocol allow-list.** `_isValidUrl()` accepted any URL that the
55
+ `URL` constructor could parse, including non-HTTP schemes. Targets are
56
+ now restricted to `http:`/`https:` by default (configurable via
57
+ `options.allowedProtocols`), failing fast with a clear error instead of
58
+ relying on the underlying HTTP client to reject it.
59
+
60
+ ### Added
61
+
62
+ - `loadHTML(html)` — parse a raw HTML string synchronously, with no HTTP
63
+ request. Useful for tests or HTML obtained some other way.
64
+ - `meta(name, attr)` — generic meta tag reader for any `name`/`property`.
65
+ - `lang()`, `robots()`, `favicon()` — new metadata getters. `favicon()`
66
+ resolves to an absolute URL using the page URL as the base.
67
+ - `jsonLd()` — extracts and parses every `<script type="application/ld+json">`
68
+ block on the page, skipping malformed ones.
69
+ - `text()` — whitespace-normalized visible body text.
70
+ - `html()` — the raw HTML of the last successful load.
71
+ - `viewport_object()` — viewport directives parsed into a key/value object.
72
+ - `toJSON()` — a ready-to-serialize snapshot of the most commonly used fields.
73
+ - `link_details()` / `image_details()` now include an `absolute_url` field
74
+ resolved against the page URL, alongside the original (possibly relative)
75
+ `url`.
76
+ - Constructor `options`: `timeout`, `userAgent`, `headers`, `maxRedirects`,
77
+ `allowedProtocols`, `throwOnError`.
78
+ - `NodeScraper.scrape(url, options)` and `NodeScraper.scrapeAll(urls, options)`
79
+ static convenience methods for one-line and concurrent scraping.
80
+ - TypeScript declarations (`types/index.d.ts`), referenced via the
81
+ package's `types` field.
82
+ - Test suite (`node --test`) covering metadata extraction, link/image
83
+ details, `filter()`, and `init()` against a local HTTP server (404,
84
+ redirects, UA-blocking, timeouts, unsupported protocols).
85
+ - Runnable examples under `examples/`.
86
+
87
+ ### Changed
88
+
89
+ - Project reorganized into `src/` (implementation), `test/`, `examples/`,
90
+ and `types/`, with `index.js` at the root re-exporting `src/NodeScraper.js`
91
+ for a stable import path. See the README's "Project Structure" section.
92
+ - `package.json` gained `engines`, `repository`, `homepage`, `bugs`,
93
+ `files`, `exports`, and `types` fields.
94
+
95
+ ### Compatibility
96
+
97
+ No breaking changes. All v1.0 method signatures and return *types* are
98
+ unchanged; only the contents of a few previously-incorrect return values
99
+ were fixed (see above). If your code relied on the untrimmed keyword/viewport
100
+ strings or on `rel: ['']`, double-check those spots.
101
+
102
+ ## [1.0.0]
103
+
104
+ Initial release: metadata extraction (title, description, Open Graph,
105
+ Twitter Card, canonical, CSRF, etc.), heading/list/image/link extraction,
106
+ and the `filter()` custom DOM query helper.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 SnakyScraper
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,449 @@
1
+ # 🕸️ NodeScraper
2
+
3
+ **NodeScraper** is a fast and flexible Node.js web scraping toolkit built on [Axios](https://www.npmjs.com/package/axios) and [Cheerio](https://www.npmjs.com/package/cheerio). It gives you a small, predictable API for pulling structured metadata and HTML out of a page — titles, Open Graph/Twitter Card tags, JSON-LD, headings, lists, images, links, and arbitrary DOM fragments — with clean, consistent return values.
4
+
5
+ > Fast. Clean. JavaScript-style scraping. 🕸️⚡
6
+
7
+ ![version](https://img.shields.io/badge/version-1.1.1-blue)
8
+ ![license](https://img.shields.io/badge/license-MIT-green)
9
+ ![node](https://img.shields.io/badge/node-%3E%3D16-brightgreen)
10
+
11
+ ---
12
+
13
+ ## Table of Contents
14
+
15
+ - [What's new in 1.1.0](#-whats-new-in-110)
16
+ - [Features](#-features)
17
+ - [Installation](#-installation)
18
+ - [Quick start](#-quick-start)
19
+ - [Error handling](#-error-handling)
20
+ - [API reference](#-api-reference)
21
+ - [Custom DOM filtering](#-custom-dom-filtering)
22
+ - [TypeScript](#-typescript)
23
+ - [Project structure](#-project-structure)
24
+ - [Testing](#-testing)
25
+ - [Examples](#-examples)
26
+ - [Migrating from 1.0.x](#-migrating-from-10x)
27
+ - [Migrating from @riodevnet/nodescraper](#-migrating-from-riodevnetnodescraper)
28
+ - [Contributing](#-contributing)
29
+ - [License](#-license)
30
+
31
+ ---
32
+
33
+ ## 📦 Package renamed: `@riodevnet/nodescraper` → `@ioodev/nodescraper`
34
+
35
+ As of **v1.1.1**, this package is published under the `@ioodev` scope. npm
36
+ scopes are tied to the account/organization name and can't be renamed
37
+ in-place, so this is a fresh publish under the new scope rather than an
38
+ update to the old one. If you're on `@riodevnet/nodescraper`, switch to
39
+ `@ioodev/nodescraper` — the API is unchanged. See
40
+ [Migrating from `@riodevnet`](#-migrating-from-riodevnetnodescraper) below.
41
+
42
+ ## 🆕 What's new in 1.1.0
43
+
44
+ This release fixes several real bugs and adds capabilities that were
45
+ missing from 1.0 — full details in [`CHANGELOG.md`](./CHANGELOG.md).
46
+
47
+ **Fixed**
48
+ - `keywords()` / `viewport()` no longer return untrimmed strings (`" domain"` → `"domain"`).
49
+ - `link_details().rel` is `[]` instead of `['']` when a link has no `rel` attribute.
50
+ - Failed loads are no longer silent — `getError()` / `getStatusCode()` tell you *why* a scrape failed (network error, timeout, 404/403/500, invalid URL).
51
+ - A realistic default `User-Agent` is now sent, so sites that block the bare Axios UA no longer fail with no explanation.
52
+ - `filter()` fails soft (`null`) instead of throwing on a malformed selector.
53
+ - URLs are restricted to `http:`/`https:` by default, failing fast with a clear error.
54
+
55
+ **Added**
56
+ - `loadHTML(html)` — parse a raw HTML string with no network request.
57
+ - `meta()`, `lang()`, `robots()`, `favicon()`, `jsonLd()`, `text()`, `html()`, `viewport_object()`, `toJSON()`.
58
+ - `absolute_url` field on `link_details()` / `image_details()`.
59
+ - Constructor options: `timeout`, `userAgent`, `headers`, `maxRedirects`, `allowedProtocols`, `throwOnError`.
60
+ - `NodeScraper.scrape()` / `NodeScraper.scrapeAll()` static convenience methods.
61
+ - TypeScript declarations, a real test suite, and runnable examples.
62
+
63
+ Nothing here is a breaking change to method names or return *shapes* — see [Migrating from 1.0.x](#-migrating-from-10x) if you depended on the buggy behavior.
64
+
65
+ ---
66
+
67
+ ## 🚀 Features
68
+
69
+ - ✅ Page metadata: title, description, keywords, author, charset, lang, robots, favicon, and more
70
+ - ✅ Open Graph, Twitter Card, canonical, CSRF token, and JSON-LD structured data
71
+ - ✅ HTML extraction: `h1`–`h6`, `p`, `ul`, `ol`, images, links — with absolute URLs resolved for you
72
+ - ✅ Powerful `filter()` method with class/ID/tag selectors for arbitrary DOM fragments
73
+ - ✅ Clear error reporting (`getError()`, `getStatusCode()`, `isLoaded()`) instead of silent failures
74
+ - ✅ Load from a live URL **or** from a raw HTML string (`loadHTML()`) — easy to test and reuse
75
+ - ✅ Configurable timeout, headers, User-Agent, redirects, and allowed protocols
76
+ - ✅ One-line single/batch scraping via `NodeScraper.scrape()` / `scrapeAll()`
77
+ - ✅ Ships with TypeScript declarations
78
+ - ✅ Zero-dependency test suite using Node's built-in test runner
79
+
80
+ ---
81
+
82
+ ## 📦 Installation
83
+
84
+ ```bash
85
+ npm install @ioodev/nodescraper
86
+ ```
87
+
88
+ > Requires Node.js 16 or later.
89
+
90
+ ---
91
+
92
+ ## 🛠️ Quick start
93
+
94
+ ```js
95
+ const NodeScraper = require("@ioodev/nodescraper");
96
+
97
+ (async () => {
98
+ const scraper = new NodeScraper("https://example.com");
99
+ await scraper.init();
100
+
101
+ if (!scraper.isLoaded()) {
102
+ console.error("Scrape failed:", scraper.getError().message);
103
+ return;
104
+ }
105
+
106
+ console.log(scraper.title()); // "Welcome to Example.com"
107
+ console.log(scraper.description()); // "This is the example meta description."
108
+ console.log(scraper.h1()); // ["Welcome", "Latest News"]
109
+ console.log(scraper.open_graph()); // { "og:title": "...", "og:description": "...", ... }
110
+
111
+ // One call, every common field:
112
+ console.log(scraper.toJSON());
113
+ })();
114
+ ```
115
+
116
+ Or with the one-line convenience wrapper:
117
+
118
+ ```js
119
+ const scraper = await NodeScraper.scrape("https://example.com");
120
+ ```
121
+
122
+ ---
123
+
124
+ ## ⚠️ Error handling
125
+
126
+ Unlike 1.0, failures are no longer silent. After `init()`, always check
127
+ `isLoaded()` (or `getError()`) before calling the getters:
128
+
129
+ ```js
130
+ const scraper = await NodeScraper.scrape("https://example.com/maybe-missing");
131
+
132
+ if (!scraper.isLoaded()) {
133
+ console.error(scraper.getError().message); // e.g. "Request failed with status code 404"
134
+ console.error(scraper.getStatusCode()); // 404, or null for network-level failures
135
+ } else {
136
+ console.log(scraper.title());
137
+ }
138
+ ```
139
+
140
+ If you'd rather handle failures with try/catch, pass `throwOnError: true`:
141
+
142
+ ```js
143
+ try {
144
+ const scraper = await NodeScraper.scrape(url, { throwOnError: true });
145
+ console.log(scraper.title());
146
+ } catch (err) {
147
+ console.error("Scrape failed:", err.message);
148
+ }
149
+ ```
150
+
151
+ When no document is loaded (before `init()`/`loadHTML()`, or after a failed
152
+ load), every getter returns `null` rather than throwing — it's always safe
153
+ to call them, you just won't get data back.
154
+
155
+ ---
156
+
157
+ ## 🧪 API reference
158
+
159
+ ### Constructor
160
+
161
+ ```js
162
+ new NodeScraper(url, options);
163
+ ```
164
+
165
+ | Option | Type | Default | Description |
166
+ |---------------------|------------|-----------------------------------|------------------------------------------------------------|
167
+ | `timeout` | `number` | `10000` | Request timeout, in ms. |
168
+ | `userAgent` | `string` | a realistic browser-like UA | Sent as the `User-Agent` header. |
169
+ | `headers` | `object` | `{}` | Extra headers merged into the request. |
170
+ | `maxRedirects` | `number` | `5` | Maximum redirects to follow. |
171
+ | `allowedProtocols` | `string[]` | `['http:', 'https:']` | Protocols accepted by the URL validator. |
172
+ | `throwOnError` | `boolean` | `false` | If `true`, `init()` rejects instead of recording the error. |
173
+
174
+ ### Loading
175
+
176
+ ```js
177
+ await scraper.init(); // fetch `url` and parse the response
178
+ scraper.loadHTML(htmlString); // parse a raw HTML string, no network request
179
+ scraper.isLoaded(); // boolean
180
+ scraper.getError(); // Error | null
181
+ scraper.getStatusCode(); // number | null
182
+ ```
183
+
184
+ ### Page metadata
185
+
186
+ ```js
187
+ scraper.title();
188
+ scraper.description();
189
+ scraper.keywords(); // string[] | null, trimmed
190
+ scraper.keyword_string(); // raw "keywords" content attribute
191
+ scraper.charset();
192
+ scraper.lang(); // <html lang="...">
193
+ scraper.canonical();
194
+ scraper.content_type();
195
+ scraper.author();
196
+ scraper.csrf_token();
197
+ scraper.image(); // shorthand for og:image
198
+ scraper.favicon(); // absolute URL
199
+ scraper.robots();
200
+ scraper.viewport(); // string[] | null, e.g. ["width=device-width", "initial-scale=1"]
201
+ scraper.viewport_string(); // raw content attribute
202
+ scraper.viewport_object(); // { width: "device-width", "initial-scale": "1" }
203
+ scraper.meta("theme-color"); // any meta[name=...] (pass attr: 'property' for meta[property=...])
204
+ ```
205
+
206
+ ### Open Graph, Twitter Card & JSON-LD
207
+
208
+ ```js
209
+ scraper.open_graph(); // all known og:* properties
210
+ scraper.open_graph("og:title"); // a single property
211
+
212
+ scraper.twitter_card();
213
+ scraper.twitter_card("twitter:title");
214
+
215
+ scraper.jsonLd(); // parsed array of every <script type="application/ld+json"> block
216
+ ```
217
+
218
+ ### Headings, text & lists
219
+
220
+ ```js
221
+ scraper.h1(); scraper.h2(); scraper.h3();
222
+ scraper.h4(); scraper.h5(); scraper.h6();
223
+ scraper.p();
224
+
225
+ scraper.text(); // normalized, whitespace-collapsed visible body text
226
+ scraper.html(); // raw HTML of the last successful load
227
+
228
+ scraper.ul(); // flattened <li> text from every <ul>
229
+ scraper.ol(); // flattened <li> text from every <ol>
230
+ ```
231
+
232
+ ### Images & links
233
+
234
+ ```js
235
+ scraper.images(); // string[] of img src
236
+ scraper.image_details(); // [{ url, absolute_url, alt_text, title }]
237
+
238
+ scraper.links(); // string[] of href
239
+ scraper.link_details();
240
+ // [{ url, absolute_url, protocol, text, title, target, rel,
241
+ // is_nofollow, is_ugc, is_noopener, is_noreferrer }]
242
+ ```
243
+
244
+ ### Convenience
245
+
246
+ ```js
247
+ scraper.toJSON();
248
+ // { url, statusCode, title, description, canonical, lang, charset, robots,
249
+ // keywords, author, image, favicon, openGraph, twitterCard,
250
+ // headings: { h1, h2, h3 }, linkCount, imageCount }
251
+
252
+ NodeScraper.scrape(url, options); // Promise<NodeScraper>
253
+ NodeScraper.scrapeAll(urls, options); // Promise<NodeScraper[]>, concurrent
254
+ ```
255
+
256
+ ---
257
+
258
+ ## 🔍 Custom DOM filtering
259
+
260
+ Use `filter()` to target specific elements and pull nested content out of them.
261
+
262
+ ```js
263
+ // Single element
264
+ scraper.filter({
265
+ element: "div",
266
+ attributes: { id: "main" },
267
+ extract: [".title", "#description", "p"],
268
+ });
269
+
270
+ // Multiple elements
271
+ scraper.filter({
272
+ element: "div",
273
+ attributes: { class: "card" },
274
+ multiple: true,
275
+ extract: ["h1", ".subtitle", "#meta"],
276
+ });
277
+
278
+ // Plain text instead of HTML
279
+ scraper.filter({
280
+ element: "p",
281
+ attributes: { class: "dark-text" },
282
+ multiple: true,
283
+ returnHtml: false,
284
+ });
285
+ ```
286
+
287
+ - `extract` accepts tag names, class selectors (`.title`), or ID selectors (`#meta`).
288
+ - Output keys are normalized: `.title` → `class__title`, `#meta` → `id__meta`.
289
+ - With no `extract`, you get the matched element's inner HTML (`returnHtml: true`, the default) or trimmed text (`returnHtml: false`).
290
+ - An invalid selector or no match returns `null` (or `[]` for `multiple: true`) — it never throws.
291
+
292
+ ---
293
+
294
+ ## 📘 TypeScript
295
+
296
+ Type declarations ship with the package (`types/index.d.ts`, wired up via
297
+ `package.json#types`) — no `@types/` package needed:
298
+
299
+ ```ts
300
+ import NodeScraper, { ScraperSnapshot, LinkDetails } from "@ioodev/nodescraper";
301
+
302
+ const scraper = new NodeScraper("https://example.com");
303
+ await scraper.init();
304
+
305
+ const snapshot: ScraperSnapshot | null = scraper.toJSON();
306
+ const links: LinkDetails[] | null = scraper.link_details();
307
+ ```
308
+
309
+ ---
310
+
311
+ ## 📁 Project structure
312
+
313
+ ```
314
+ nodescraper/
315
+ ├── .github/
316
+ │ └── workflows/
317
+ │ └── test.yml # CI: runs the test suite on push/PR across Node 16–22
318
+ ├── examples/
319
+ │ ├── 01-basic-usage.js
320
+ │ ├── 02-custom-filter.js
321
+ │ ├── 03-batch-scraping.js
322
+ │ └── 04-json-ld-and-extras.js
323
+ ├── src/
324
+ │ ├── NodeScraper.js # main class — all implementation lives here
325
+ │ ├── constants.js # default UA, timeout, OG/Twitter property lists
326
+ │ └── utils.js # small pure helpers (URL validation, trimming, etc.)
327
+ ├── test/
328
+ │ ├── fixtures/
329
+ │ │ └── sample.html # HTML fixture used by the test suite
330
+ │ ├── helpers/
331
+ │ │ └── test-server.js # local HTTP server (200/404/redirect/403/slow routes)
332
+ │ └── nodescraper.test.js # the test suite itself
333
+ ├── types/
334
+ │ └── index.d.ts # TypeScript declarations
335
+ ├── index.js # entry point — re-exports src/NodeScraper.js
336
+ ├── package.json
337
+ ├── CHANGELOG.md
338
+ ├── README.md
339
+ └── LICENSE
340
+ ```
341
+
342
+ `index.js` stays a thin re-export so `require("@ioodev/nodescraper")`
343
+ keeps working exactly as before; all real logic lives under `src/`, which
344
+ keeps the public entry point stable while leaving room to split the
345
+ implementation further (e.g. a `src/extractors/` folder) without touching
346
+ how consumers import the package.
347
+
348
+ ---
349
+
350
+ ## 🧪 Testing
351
+
352
+ The test suite uses Node's built-in test runner — no extra dev dependency
353
+ required.
354
+
355
+ ```bash
356
+ npm test # run the suite once
357
+ npm run test:watch # re-run on file changes
358
+ ```
359
+
360
+ It covers:
361
+ - Metadata/OG/Twitter/JSON-LD extraction against a fixture page
362
+ - The bug fixes above (trimmed keywords/viewport, empty `rel`)
363
+ - `filter()`, including the malformed-selector fail-soft path
364
+ - `init()` against a local HTTP server: 200, 404, redirects, UA-blocking, timeouts, and rejected protocols
365
+ - `loadHTML()`, `toJSON()`, and the `scrape()` / `scrapeAll()` static helpers
366
+
367
+ ---
368
+
369
+ ## 💡 Examples
370
+
371
+ Runnable scripts live in [`examples/`](./examples):
372
+
373
+ ```bash
374
+ npm run example:basic # metadata + toJSON()
375
+ npm run example:filter # filter() single/multiple/text-vs-html
376
+ npm run example:batch # scrapeAll() with custom headers/timeout
377
+ npm run example:extras # loadHTML(), jsonLd(), favicon(), meta()
378
+ ```
379
+
380
+ ---
381
+
382
+ ## 🔁 Migrating from 1.0.x
383
+
384
+ No method was renamed or removed, so existing calls keep working as-is.
385
+ Two return values changed because they were bugs, not intentional API:
386
+
387
+ | Method | 1.0.x | 1.1.0 |
388
+ |----------------------------------|--------------------------------------|---------------------------------------|
389
+ | `keywords()` / `viewport()` | entries could have leading spaces | entries are trimmed |
390
+ | `link_details()[i].rel` | `['']` when no `rel` attribute | `[]` when no `rel` attribute |
391
+
392
+ If your code special-cased either of those (e.g. `.map(k => k.trim())` on
393
+ `keywords()`, or checked `rel.length === 1 && rel[0] === ''`), you can drop
394
+ that workaround.
395
+
396
+ Everything else — `loadHTML()`, `meta()`, `lang()`, `robots()`, `favicon()`,
397
+ `jsonLd()`, `text()`, `html()`, `viewport_object()`, `toJSON()`,
398
+ `absolute_url` fields, constructor `options`, and the static `scrape()` /
399
+ `scrapeAll()` helpers — is purely additive.
400
+
401
+ ---
402
+
403
+ ## 📦 Migrating from `@riodevnet/nodescraper`
404
+
405
+ This package used to be published as `@riodevnet/nodescraper`. The code,
406
+ API, and version history are the same — only the npm scope changed.
407
+
408
+ ```diff
409
+ - npm install @riodevnet/nodescraper
410
+ + npm install @ioodev/nodescraper
411
+ ```
412
+
413
+ ```diff
414
+ - const NodeScraper = require("@riodevnet/nodescraper");
415
+ + const NodeScraper = require("@ioodev/nodescraper");
416
+ ```
417
+
418
+ Update any `package.json` dependency entries the same way, then reinstall.
419
+ `@riodevnet/nodescraper` is not getting further updates — please move to
420
+ `@ioodev/nodescraper` for new fixes and features.
421
+
422
+ ---
423
+
424
+ ## 🤝 Contributing
425
+
426
+ Contributions are welcome! Found a bug or want to request a feature?
427
+ Please open an [issue](https://github.com/ioodev/nodescraper/issues) or
428
+ submit a pull request. Run `npm test` before submitting — CI runs the same
429
+ suite across Node 16, 18, 20, and 22.
430
+
431
+ ---
432
+
433
+ ## 📄 License
434
+
435
+ MIT License © 2025–2026 — NodeScraper
436
+
437
+ ---
438
+
439
+ ## 🔗 Related Projects
440
+
441
+ - [Axios](https://axios-http.com/)
442
+ - [Cheerio](https://cheerio.js.org/)
443
+ - [Node.js](https://nodejs.org/)
444
+
445
+ ---
446
+
447
+ ## 💡 Why NodeScraper?
448
+
449
+ > Think of it as your JavaScript web detective — fast, efficient, and precise.
package/index.js ADDED
@@ -0,0 +1,3 @@
1
+ 'use strict';
2
+
3
+ module.exports = require('./src/NodeScraper');
package/package.json ADDED
@@ -0,0 +1,57 @@
1
+ {
2
+ "name": "@ioodev/nodescraper",
3
+ "version": "1.1.1",
4
+ "description": "NodeScraper is a fast and flexible Node.js web scraping toolkit built using Axios and Cheerio. It provides an intuitive interface for extracting structured HTML and metadata from websites — with clean and consistent outputs.",
5
+ "main": "index.js",
6
+ "types": "types/index.d.ts",
7
+ "files": [
8
+ "index.js",
9
+ "src",
10
+ "types",
11
+ "README.md",
12
+ "CHANGELOG.md",
13
+ "LICENSE"
14
+ ],
15
+ "exports": {
16
+ ".": "./index.js",
17
+ "./package.json": "./package.json"
18
+ },
19
+ "engines": {
20
+ "node": ">=16"
21
+ },
22
+ "scripts": {
23
+ "test": "node --test",
24
+ "test:watch": "node --test --watch",
25
+ "example:basic": "node examples/01-basic-usage.js",
26
+ "example:filter": "node examples/02-custom-filter.js",
27
+ "example:batch": "node examples/03-batch-scraping.js",
28
+ "example:extras": "node examples/04-json-ld-and-extras.js"
29
+ },
30
+ "keywords": [
31
+ "nodescraper",
32
+ "scraper",
33
+ "scraping",
34
+ "web-scraping",
35
+ "html-parser",
36
+ "cheerio",
37
+ "axios",
38
+ "metadata",
39
+ "open-graph",
40
+ "seo"
41
+ ],
42
+ "author": "Rio Agung Purnomo",
43
+ "license": "MIT",
44
+ "type": "commonjs",
45
+ "repository": {
46
+ "type": "git",
47
+ "url": "git+https://github.com/ioodev/nodescraper.git"
48
+ },
49
+ "homepage": "https://github.com/ioodev/nodescraper#readme",
50
+ "bugs": {
51
+ "url": "https://github.com/ioodev/nodescraper/issues"
52
+ },
53
+ "dependencies": {
54
+ "axios": "^1.10.0",
55
+ "cheerio": "^1.1.0"
56
+ }
57
+ }