node-html-parser 4.1.4 → 5.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +28 -0
- package/README.md +34 -25
- package/dist/main.js +177 -271
- package/dist/nodes/html.d.ts +12 -5
- package/dist/nodes/html.js +177 -271
- package/esm/index.js +11 -0
- package/esm/package.json +3 -0
- package/package.json +46 -17
- package/.eslintignore +0 -3
- package/.eslintrc.json +0 -226
- package/.mocharc.yaml +0 -1
- package/dist/esm/back.js +0 -3
- package/dist/esm/index.js +0 -7
- package/dist/esm/matcher.js +0 -101
- package/dist/esm/nodes/comment.js +0 -23
- package/dist/esm/nodes/html.js +0 -1102
- package/dist/esm/nodes/node.js +0 -25
- package/dist/esm/nodes/text.js +0 -95
- package/dist/esm/nodes/type.js +0 -7
- package/dist/esm/parse.js +0 -1
- package/dist/esm/valid.js +0 -9
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file. See [standard-version](https://github.com/conventional-changelog/standard-version) for commit guidelines.
|
|
4
|
+
|
|
5
|
+
## [5.1.0](https://github.com/taoqf/node-fast-html-parser/compare/v4.1.5...v5.1.0) (2021-10-28)
|
|
6
|
+
|
|
7
|
+
### Features
|
|
8
|
+
|
|
9
|
+
* Exposed `HTMLElement#rawAttrs` (made public) ([34f1595](https://github.com/taoqf/node-fast-html-parser/commit/34f1595756c0974b6ae7ef5755a615f09e421f32))
|
|
10
|
+
|
|
11
|
+
## [5.0.0](https://github.com/taoqf/node-fast-html-parser/compare/v4.1.5...v5.0.0) (2021-10-10)
|
|
12
|
+
|
|
13
|
+
|
|
14
|
+
### ⚠ BREAKING CHANGES
|
|
15
|
+
|
|
16
|
+
* Added esm named export support ([0d4b922](https://github.com/taoqf/node-fast-html-parser/commit/0d4b922eefd6210fe802991e464b21b0c69d5f63))
|
|
17
|
+
|
|
18
|
+
### Features
|
|
19
|
+
|
|
20
|
+
* Added esm named export support (closes [#160](https://github.com/taoqf/node-fast-html-parser/issues/160) closes [#139](https://github.com/taoqf/node-fast-html-parser/issues/139)) ([0d4b922](https://github.com/taoqf/node-fast-html-parser/commit/0d4b922eefd6210fe802991e464b21b0c69d5f63))
|
|
21
|
+
* Added HTMLElement#getElementsByTagName ([d462e44](https://github.com/taoqf/node-fast-html-parser/commit/d462e449e7ebb00a5a43fb574133681ad5a62475))
|
|
22
|
+
* Improved parsing performance + matching (closes [#164](https://github.com/taoqf/node-fast-html-parser/issues/164)) ([3c5b8e2](https://github.com/taoqf/node-fast-html-parser/commit/3c5b8e2a9104b01a8ca899a7970507463e42adaf))
|
|
23
|
+
|
|
24
|
+
|
|
25
|
+
### Bug Fixes
|
|
26
|
+
|
|
27
|
+
* Add null to return type for HTMLElement#querySelector (closes [#157](https://github.com/taoqf/node-fast-html-parser/issues/157)) ([2b65583](https://github.com/taoqf/node-fast-html-parser/commit/2b655839bd3868c41fb19cae5786ca097565bc7f))
|
|
28
|
+
* blockTextElements incorrectly matching partial tag (detail) (fixes [#156](https://github.com/taoqf/node-fast-html-parser/issues/156) fixes [#124](https://github.com/taoqf/node-fast-html-parser/issues/124)) ([6823349](https://github.com/taoqf/node-fast-html-parser/commit/6823349fdf1809c7484c70d948aa24930ef4983f))
|
package/README.md
CHANGED
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
# Fast HTML Parser [](http://badge.fury.io/js/node-html-parser) [](http://badge.fury.io/js/node-html-parser) [](https://actions-badge.atrox.dev/taoqf/node-html-parser/goto?ref=main)
|
|
2
2
|
|
|
3
3
|
Fast HTML Parser is a _very fast_ HTML parser. Which will generate a simplified
|
|
4
4
|
DOM tree, with element query support.
|
|
@@ -19,15 +19,18 @@ npm install --save node-html-parser
|
|
|
19
19
|
|
|
20
20
|
## Performance
|
|
21
21
|
|
|
22
|
-
Faster than htmlparser2!
|
|
23
|
-
|
|
24
22
|
```shell
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
htmlparser2 :
|
|
30
|
-
node-html-parser:2.
|
|
23
|
+
cheerio :12.0726 ms/file ± 7.31605
|
|
24
|
+
parse5 :8.18615 ms/file ± 6.15337
|
|
25
|
+
node-html-parser (last release):2.16533 ms/file ± 1.56924
|
|
26
|
+
htmlparser :17.0658 ms/file ± 120.901
|
|
27
|
+
htmlparser2 :2.62695 ms/file ± 4.17579
|
|
28
|
+
node-html-parser:2.14907 ms/file ± 1.66632
|
|
29
|
+
html-parser :24.6505 ms/file ± 18.9996
|
|
30
|
+
htmljs-parser :5.81797 ms/file ± 6.55537
|
|
31
|
+
html-dom-parser :2.52265 ms/file ± 3.54858
|
|
32
|
+
html5parser :2.01144 ms/file ± 2.53570
|
|
33
|
+
high5 :3.91342 ms/file ± 2.65563
|
|
31
34
|
```
|
|
32
35
|
|
|
33
36
|
Tested with [htmlparser-benchmark](https://github.com/AndreasMadsen/htmlparser-benchmark).
|
|
@@ -70,15 +73,15 @@ var root = HTMLParser.parse('<ul id="list"><li>Hello World</li></ul>');
|
|
|
70
73
|
|
|
71
74
|
### parse(data[, options])
|
|
72
75
|
|
|
73
|
-
Parse
|
|
76
|
+
Parse the data provided, and return the root of the generated DOM.
|
|
74
77
|
|
|
75
78
|
- **data**, data to parse
|
|
76
79
|
- **options**, parse options
|
|
77
80
|
|
|
78
81
|
```js
|
|
79
82
|
{
|
|
80
|
-
lowerCaseTagName: false, // convert tag name to lower case (
|
|
81
|
-
comment: false // retrieve comments (
|
|
83
|
+
lowerCaseTagName: false, // convert tag name to lower case (hurts performance heavily)
|
|
84
|
+
comment: false, // retrieve comments (hurts performance slightly)
|
|
82
85
|
blockTextElements: {
|
|
83
86
|
script: true, // keep text content when parsing
|
|
84
87
|
noscript: true, // keep text content when parsing
|
|
@@ -90,7 +93,7 @@ Parse given data, and return root of the generated DOM.
|
|
|
90
93
|
|
|
91
94
|
### valid(data[, options])
|
|
92
95
|
|
|
93
|
-
Parse
|
|
96
|
+
Parse the data provided, return true if the given data is valid, and return false if not.
|
|
94
97
|
|
|
95
98
|
## HTMLElement Methods
|
|
96
99
|
|
|
@@ -106,12 +109,18 @@ Remove whitespaces in this sub tree.
|
|
|
106
109
|
|
|
107
110
|
Query CSS selector to find matching nodes.
|
|
108
111
|
|
|
109
|
-
Note: Full
|
|
112
|
+
Note: Full range of CSS3 selectors supported since v3.0.0.
|
|
110
113
|
|
|
111
114
|
### HTMLElement#querySelector(selector)
|
|
112
115
|
|
|
113
116
|
Query CSS Selector to find matching node.
|
|
114
117
|
|
|
118
|
+
### HTMLElement#getElementsByTagName(tagName)
|
|
119
|
+
|
|
120
|
+
Get all elements with the specified tagName.
|
|
121
|
+
|
|
122
|
+
Note: Use * for all elements.
|
|
123
|
+
|
|
115
124
|
### HTMLElement#closest(selector)
|
|
116
125
|
|
|
117
126
|
Query closest element by css selector.
|
|
@@ -122,7 +131,7 @@ Append a child node to childNodes
|
|
|
122
131
|
|
|
123
132
|
### HTMLElement#insertAdjacentHTML(where, html)
|
|
124
133
|
|
|
125
|
-
|
|
134
|
+
Parses the specified text as HTML and inserts the resulting nodes into the DOM tree at a specified position.
|
|
126
135
|
|
|
127
136
|
### HTMLElement#setAttribute(key: string, value: string)
|
|
128
137
|
|
|
@@ -180,15 +189,15 @@ Remove class name.
|
|
|
180
189
|
|
|
181
190
|
#### HTMLElement#classList.toggle(className: string):void
|
|
182
191
|
|
|
183
|
-
Toggle class.
|
|
192
|
+
Toggle class. Remove it if it is already included, otherwise add.
|
|
184
193
|
|
|
185
194
|
#### HTMLElement#classList.contains(className: string): boolean
|
|
186
195
|
|
|
187
|
-
|
|
196
|
+
Returns true if the classname is already in the classList.
|
|
188
197
|
|
|
189
198
|
#### HTMLElement#classList.values()
|
|
190
199
|
|
|
191
|
-
|
|
200
|
+
Get class names.
|
|
192
201
|
|
|
193
202
|
## HTMLElement Properties
|
|
194
203
|
|
|
@@ -199,28 +208,28 @@ Get unescaped text value of current node and its children. Like `innerText`.
|
|
|
199
208
|
|
|
200
209
|
### HTMLElement#rawText
|
|
201
210
|
|
|
202
|
-
Get
|
|
211
|
+
Get escaped (as-is) text value of current node and its children. May have
|
|
203
212
|
`&` in it. (fast)
|
|
204
213
|
|
|
205
214
|
### HTMLElement#tagName
|
|
206
215
|
|
|
207
|
-
Get tag name of HTMLElement. Notice: the returned value would be an uppercase string.
|
|
216
|
+
Get or Set tag name of HTMLElement. Notice: the returned value would be an uppercase string.
|
|
208
217
|
|
|
209
218
|
### HTMLElement#structuredText
|
|
210
219
|
|
|
211
|
-
Get structured Text
|
|
220
|
+
Get structured Text.
|
|
212
221
|
|
|
213
222
|
### HTMLElement#structure
|
|
214
223
|
|
|
215
|
-
Get DOM structure
|
|
224
|
+
Get DOM structure.
|
|
216
225
|
|
|
217
226
|
### HTMLElement#firstChild
|
|
218
227
|
|
|
219
|
-
Get first child node
|
|
228
|
+
Get first child node.
|
|
220
229
|
|
|
221
230
|
### HTMLElement#lastChild
|
|
222
231
|
|
|
223
|
-
Get last child node
|
|
232
|
+
Get last child node.
|
|
224
233
|
|
|
225
234
|
### HTMLElement#innerHTML
|
|
226
235
|
|
|
@@ -252,4 +261,4 @@ Get all attributes of current element. **Notice: do not try to change the return
|
|
|
252
261
|
|
|
253
262
|
### HTMLElement#range
|
|
254
263
|
|
|
255
|
-
Corresponding source code start and end indexes (ie [ 0, 40 ])
|
|
264
|
+
Corresponding source code start and end indexes (ie [ 0, 40 ])
|