tag-soup-ng 0.0.1-security → 1.1.21

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of tag-soup-ng might be problematic. Click here for more details.

package/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2021 Savva Mikhalevski
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md CHANGED
@@ -1,5 +1,285 @@
1
- # Security holding package
1
+ # TagSoup 🍜 [![build](https://github.com/smikhalevski/tag-soup/actions/workflows/master.yml/badge.svg?branch=master&event=push)](https://github.com/smikhalevski/tag-soup/actions/workflows/master.yml)
2
2
 
3
- This package contained malicious code and was removed from the registry by the npm security team. A placeholder was published to ensure users are not affected in the future.
3
+ TagSoup is [the fastest](#performance) pure JS SAX/DOM XML/HTML parser.
4
4
 
5
- Please refer to www.npmjs.com/advisories?search=tag-soup-ng for more information.
5
+ - [It is the fastest](#performance);
6
+ - Tiny and tree-shakable, [just 7 kB gzipped](https://bundlephobia.com/result?p=tag-soup), including dependencies;
7
+ - Streaming support with SAX and DOM parsers for XML and HTML;
8
+ - Extremely low memory consumption;
9
+ - Forgives malformed tag nesting and missing end tags;
10
+ - Parses HTML attributes in the same way your browser does,
11
+ [see tests for more details](https://github.com/smikhalevski/tag-soup/blob/master/src/test/tokenize.test.ts);
12
+ - Recognizes CDATA, processing instructions, and DOCTYPE;
13
+
14
+ ```sh
15
+ npm install --save-prod tag-soup
16
+ ```
17
+
18
+ # Usage
19
+
20
+ ⚠️ [API documentation is available here.](https://smikhalevski.github.io/tag-soup/)
21
+
22
+ ## SAX
23
+
24
+ ```ts
25
+ import {createSaxParser} from 'tag-soup';
26
+
27
+ // Or use
28
+ // import {createXmlSaxParser, createHtmlSaxParser} from 'tag-soup';
29
+
30
+ const saxParser = createSaxParser({
31
+
32
+ startTag(token) {
33
+ console.log(token); // → {tokenType: 1, name: 'foo', …}
34
+ },
35
+
36
+ endTag(token) {
37
+ console.log(token); // → {tokenType: 101, data: 'okay', …}
38
+ },
39
+ });
40
+
41
+ saxParser.parse('<foo>okay');
42
+ ```
43
+
44
+ SAX parser invokes [callbacks during parsing](https://smikhalevski.github.io/tag-soup/interfaces/isaxhandler.html).
45
+
46
+ Callbacks receive [tokens](https://smikhalevski.github.io/tag-soup/modules.html#token) which represent structures read
47
+ from the input. Tokens are pooled objects so when handler callback finishes they are returned to the pool and reused.
48
+ Object pooling drastically reduces memory consumption and allows passing a lot of data to the callback.
49
+
50
+ If you need to retain token after callback finishes use
51
+ [`token.clone()`](https://smikhalevski.github.io/tag-soup/interfaces/itoken.html#clone) which returns the deep copy of
52
+ the token.
53
+
54
+ `startTag` and `endTag` callbacks are always invoked in the correct order even if tags in the input were incorrectly
55
+ nested or missed.
56
+ For [self-closing tags](https://smikhalevski.github.io/tag-soup/interfaces/istarttagtoken.html#selfclosing) only
57
+ `startTag` callback in invoked.
58
+
59
+ ### Defaults
60
+
61
+ All SAX parser factories accept two arguments
62
+ [the handler with callbacks](https://smikhalevski.github.io/tag-soup/interfaces/isaxhandler.html) and
63
+ [options](https://smikhalevski.github.io/tag-soup/interfaces/iparseroptions.html). The most generic parser factory
64
+ [`createSaxParser`](https://smikhalevski.github.io/tag-soup/modules.html#createsaxparser) doesn't have any defaults.
65
+
66
+ For [`createXmlSaxParser`](https://smikhalevski.github.io/tag-soup/modules.html#createxmlsaxparser) defaults are
67
+ [`xmlParserOptions`](https://smikhalevski.github.io/tag-soup/modules.html#xmlparseroptions):
68
+
69
+ - CDATA sections, processing instructions and self-closing tags are recognized;
70
+ - XML entities are decoded in text and attribute values;
71
+ - Tag and attribute names are preserved as is;
72
+
73
+ For [`createHtmlSaxParser`](https://smikhalevski.github.io/tag-soup/modules.html#createhtmlsaxparser) defaults are
74
+ [`htmlParserOptions`](https://smikhalevski.github.io/tag-soup/modules.html#htmlparseroptions):
75
+
76
+ - CDATA sections and processing instructions are treated as comments;
77
+ - Self-closing tags are treated as a start tags;
78
+ - Tags like `p`, `li`, `td` and others follow implicit end rules, so `<p>foo<p>bar` is parsed as `<p>foo</p><p>bar</p>`;
79
+ - Tag and attribute names are converted to lower case;
80
+ - Legacy HTML entities are decoded in text and attribute values.
81
+
82
+ You can alter how the parser works
83
+ [through options](https://smikhalevski.github.io/tag-soup/interfaces/iparseroptions.html) which give you fine-grained
84
+ control over parsing dialect.
85
+
86
+ By default, TagSoup uses [`speedy-entites`](https://github.com/smikhalevski/speedy-entities#readme) to decode XML and HTML
87
+ entities. Parser created by `createHtmlSaxParser` decodes only legacy HTML entities. This is done to reduce the bundle
88
+ size.
89
+
90
+ To decode [all HTML entities](https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references) use this
91
+ snippet below. It would add 10 kB gzipped to the bundle size.
92
+
93
+ ```ts
94
+ import {decodeHtml} from 'speedy-entities/lib/full';
95
+
96
+ const htmlParser = createHtmlSaxParser({
97
+ decodeText: decodeHtml,
98
+ decodeAttribute: decodeHtml,
99
+ });
100
+ ```
101
+
102
+ With `speedy-entites` you can create [a custom decoder](https://github.com/smikhalevski/speedy-entities#custom-decoders)
103
+ that would recognize custom entities.
104
+
105
+ <details>
106
+ <summary>The list of legacy HTML entities</summary>
107
+ <p>
108
+
109
+ > `aacute` `Aacute` `acirc` `Acirc` `acute` `aelig` `AElig` `agrave` `Agrave` `amp` `AMP` `aring` `Aring` `atilde`
110
+ > `Atilde` `auml` `Auml` `brvbar` `ccedil` `Ccedil` `cedil` `cent` `copy` `COPY` `curren` `deg` `divide` `eacute`
111
+ > `Eacute` `ecirc` `Ecirc` `egrave` `Egrave` `eth` `ETH` `euml` `Euml` `frac12` `frac14` `frac34` `gt` `GT` `iacute`
112
+ > `Iacute` `icirc` `Icirc` `iexcl` `igrave` `Igrave` `iquest` `iuml` `Iuml` `laquo` `lt` `LT` `macr` `micro` `middot`
113
+ > `nbsp` `not` `ntilde` `Ntilde` `oacute` `Oacute` `ocirc` `Ocirc` `ograve` `Ograve` `ordf` `ordm` `oslash` `Oslash`
114
+ > `otilde` `Otilde` `ouml` `Ouml` `para` `plusmn` `pound` `quot` `QUOT` `raquo` `reg` `REG` `sect` `shy` `sup1` `sup2`
115
+ > `sup3` `szlig` `thorn` `THORN` `times` `uacute` `Uacute` `ucirc` `Ucirc` `ugrave` `Ugrave` `uml` `uuml` `Uuml`
116
+ > `yacute` `Yacute` `yen` `yuml`
117
+
118
+ </p>
119
+ </details>
120
+
121
+ ### Streaming
122
+
123
+ SAX parsers support streaming. You can use
124
+ [`saxParser.write(chunk)`](https://smikhalevski.github.io/tag-soup/interfaces/iparser.html#write) to parse input data
125
+ chunk by chunk.
126
+
127
+ ```ts
128
+ const saxParser = createSaxParser({/*callbacks*/});
129
+
130
+ saxParser.write('<foo>ok');
131
+ // Triggers startTag callabck for "foo" tag.
132
+
133
+ saxParser.write('ay');
134
+ // Doesn't trigger any callbacks.
135
+
136
+ saxParser.write('</foo>');
137
+ // Triggers text callback for "okay" and endTag callback for "foo" tag.
138
+ ```
139
+
140
+ ## DOM
141
+
142
+ ```ts
143
+ import {createDomParser} from 'tag-soup';
144
+
145
+ // Or use
146
+ // import {createXmlDomParser, createHtmlDomParser} from 'tag-soup';
147
+
148
+ // Minimal DOM handler example
149
+ const domParser = createDomParser<any>({
150
+
151
+ element(token) {
152
+ return {tagName: token.name, children: []};
153
+ },
154
+
155
+ appendChild(parentNode, node) {
156
+ parentNode.children.push(node);
157
+ },
158
+ });
159
+
160
+ const domNode = domParser.parse('<foo>okay');
161
+
162
+ console.log(domNode[0].children[0].data); // → 'okay'
163
+ ```
164
+
165
+ DOM parser assembles a node three using a
166
+ [handler](https://smikhalevski.github.io/tag-soup/interfaces/idomhandler.html) that describes how nodes are created and
167
+ appended.
168
+
169
+ The generic parser factory [`createDomParser`](https://smikhalevski.github.io/tag-soup/modules.html#createdomparser)
170
+ requires a [handler](https://smikhalevski.github.io/tag-soup/interfaces/idomhandler.html) to be provided.
171
+
172
+ Both [`createXmlDomParser`](https://smikhalevski.github.io/tag-soup/modules.html#createxmldomparser) and
173
+ [`createHtmlDomParser`](https://smikhalevski.github.io/tag-soup/modules.html#createhtmldomparser) use
174
+ [`domHandler`](https://smikhalevski.github.io/tag-soup/modules.html#domhandler) if no other handler was provided and use
175
+ default options ([`xmlParserOptions`](https://smikhalevski.github.io/tag-soup/modules.html#xmlparseroptions)
176
+ and [`htmlParserOptions`](https://smikhalevski.github.io/tag-soup/modules.html#htmlparseroptions) respectively) which
177
+ [can be overridden](https://smikhalevski.github.io/tag-soup/interfaces/iparseroptions.html).
178
+
179
+ ### Streaming
180
+
181
+ DOM parsers support streaming. You can use
182
+ [`domParser.write(chunk)`](https://smikhalevski.github.io/tag-soup/interfaces/iparser.html#write) to parse input data
183
+ chunk by chunk.
184
+
185
+ ```ts
186
+ const domParser = createXmlDomParser();
187
+
188
+ domParser.write('<foo>ok');
189
+ // → [{nodeType: 1, tagName: 'foo', children: [], …}]
190
+
191
+ domParser.write('ay');
192
+ // → [{nodeType: 1, tagName: 'foo', children: [], …}]
193
+
194
+ domParser.write('</foo>');
195
+ // → [{nodeType: 1, tagName: 'foo', children: [{nodeType: 3, data: 'okay', …}], …}]
196
+ ```
197
+
198
+ # Performance
199
+
200
+ [To run a performance test](./src/test/perf.js) use `npm ci && npm run build && npm run perf`.
201
+
202
+ ## Large input
203
+
204
+ Performance was measured when parsing [the 3.81 MB HTML file](./src/test/test.html).
205
+
206
+ Results are in operations per second. The higher number is better.
207
+
208
+ ### SAX benchmark
209
+
210
+ | | Ops/sec |
211
+ | --- | ---: |
212
+ | `createSaxParser` ¹ | 36.3 ± 0.8% |
213
+ | `createXmlSaxParser` ¹ | 30.7 ± 0.5% |
214
+ | `createHtmlSaxParser` ¹ | 23.7 ± 0.5% |
215
+ | `createSaxParser` | 29.2 ± 0.5% |
216
+ | `createXmlSaxParser` | 26.1 ± 0.5% |
217
+ | `createHtmlSaxParser` | 19.9 ± 0.5% |
218
+ | [`@fb55/htmlparser2`](https://github.com/fb55/htmlparser2) | 14.3 ± 0.5% |
219
+ | [`@isaacs/sax-js`](https://github.com/isaacs/sax-js) | 1.7 ± 4.6% |
220
+
221
+ ¹ Parsers were provided a handler with a single
222
+ [`text`](https://smikhalevski.github.io/tag-soup/interfaces/isaxhandler.html#text) callback. This configuration can be
223
+ useful if you want to strip tags from the input.
224
+
225
+ ### DOM benchmark
226
+
227
+ | | Ops/sec |
228
+ | --- | ---: |
229
+ | `createDomParser` | 13.7 ± 0.5% |
230
+ | `createXmlDomParser` | 12.6 ± 0.5% |
231
+ | `createHtmlDomParser` | 10.6 ± 0.5% |
232
+ | [`@fb55/htmlparser2`](https://github.com/fb55/htmlparser2) | 8.4 ± 0.5% |
233
+ | [`@inikulin/parse5`](https://github.com/inikulin/parse5) | 2.8 ± 0.7% |
234
+
235
+ ## Small input
236
+
237
+ The performance was measured when parsing
238
+ [258 files with 95 kB in size on average](https://github.com/AndreasMadsen/htmlparser-benchmark/tree/master/files) from
239
+ [`htmlparser-benchmark`](https://github.com/AndreasMadsen/htmlparser-benchmark).
240
+
241
+ Results are in operations per second. The higher number is better.
242
+
243
+ ### SAX benchmark
244
+
245
+ | | Ops/sec |
246
+ | --- | ---: |
247
+ | `createSaxParser` | 1 998.0 ± 0.1% |
248
+ | `createXmlSaxParser` | 1 734.1 ± 0.1% |
249
+ | `createHtmlSaxParser` | 1 285.4 ± 0.1% |
250
+ | [`@fb55/htmlparser2`](https://github.com/fb55/htmlparser2) | 717.5 ± 0.2% |
251
+
252
+ ### DOM benchmark
253
+
254
+ | | Ops/sec |
255
+ | --- | ---: |
256
+ | `createDomParser` | 1 087.1 ± 0.2% |
257
+ | `createXmlDomParser` | 853.5 ± 0.2% |
258
+ | `createHtmlDomParser` | 668.0 ± 0.2% |
259
+ | [`@fb55/htmlparser2`](https://github.com/fb55/htmlparser2) | 457.7 ± 0.2% |
260
+ | [`@inikulin/parse5`](https://github.com/inikulin/parse5) | 50.8 ± 0.4% |
261
+
262
+ # Limitations
263
+
264
+ TagSoup doesn't resolve some weird element structures that malformed HTML may cause.
265
+
266
+ For example, assume the following markup:
267
+
268
+ ```html
269
+ <p><strong>okay
270
+ <p>nope
271
+ ```
272
+
273
+ With [`DOMParser`](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser) this markup would be transformed to:
274
+
275
+ ```html
276
+ <p><strong>okay</strong></p>
277
+ <p><strong>nope</strong></p>
278
+ ```
279
+
280
+ TagSoup doesn't insert the second `strong` tag:
281
+
282
+ ```html
283
+ <p><strong>okay</strong></p>
284
+ <p>nope</p> <!-- Note the absent "strong" tag -->
285
+ ```
@@ -0,0 +1,12 @@
1
+ import { IDomHandler, IParser, IParserOptions } from './parser-types';
2
+ /**
3
+ * Creates a new stateful DOM parser.
4
+ *
5
+ * @template Node The type of object that describes a node in the DOM tree.
6
+ * @template ContainerNode The type of object that describes an element or a document in the DOM tree.
7
+ *
8
+ * @param handler The handler that provides factories and callbacks that produce the DOM tree.
9
+ * @param options The parser options.
10
+ * @returns The new parser that produces a DOM tree during parsing.
11
+ */
12
+ export declare function createDomParser<Node, ContainerNode extends Node>(handler: IDomHandler<Node, ContainerNode>, options?: IParserOptions): IParser<Array<Node>>;
@@ -0,0 +1,84 @@
1
+ import { createSaxParser } from './createSaxParser';
2
+ /**
3
+ * Creates a new stateful DOM parser.
4
+ *
5
+ * @template Node The type of object that describes a node in the DOM tree.
6
+ * @template ContainerNode The type of object that describes an element or a document in the DOM tree.
7
+ *
8
+ * @param handler The handler that provides factories and callbacks that produce the DOM tree.
9
+ * @param options The parser options.
10
+ * @returns The new parser that produces a DOM tree during parsing.
11
+ */
12
+ export function createDomParser(handler, options) {
13
+ var nodes = [];
14
+ var saxParser = createSaxParser(createSaxHandler(nodes, handler, function (node) { return nodes.push(node); }), options);
15
+ var write = function (sourceChunk) {
16
+ saxParser.write(sourceChunk);
17
+ return nodes;
18
+ };
19
+ var parse = function (source) {
20
+ saxParser.parse(source);
21
+ var result = nodes;
22
+ reset();
23
+ return result;
24
+ };
25
+ var reset = function () {
26
+ saxParser.reset();
27
+ nodes = [];
28
+ };
29
+ return {
30
+ write: write,
31
+ parse: parse,
32
+ reset: reset,
33
+ };
34
+ }
35
+ function createSaxHandler(nodes, handler, pushRootNode) {
36
+ var elementFactory = handler.element, containerEndCallback = handler.containerEnd, appendChildCallback = handler.appendChild, textFactory = handler.text, documentFactory = handler.document, commentFactory = handler.comment, processingInstructionFactory = handler.processingInstruction, cdataFactory = handler.cdata, sourceEndCallback = handler.sourceEnd, resetCallback = handler.reset;
37
+ var ancestors = { length: 0 };
38
+ if (typeof elementFactory !== 'function') {
39
+ throw new Error('Missing element factory');
40
+ }
41
+ if (typeof appendChildCallback !== 'function') {
42
+ throw new Error('Missing appendChild callback');
43
+ }
44
+ var pushNode = function (node) {
45
+ if (ancestors.length !== 0) {
46
+ appendChildCallback(ancestors[ancestors.length - 1], node);
47
+ }
48
+ else {
49
+ pushRootNode(node);
50
+ }
51
+ };
52
+ var createDataTokenCallback = function (dataFactory) {
53
+ return dataFactory != null ? function (token) { return pushNode(dataFactory(token)); } : undefined;
54
+ };
55
+ return {
56
+ startTag: function (token) {
57
+ var node = elementFactory(token);
58
+ pushNode(node);
59
+ if (!token.selfClosing) {
60
+ ancestors[ancestors.length++] = node;
61
+ }
62
+ },
63
+ endTag: function (token) {
64
+ --ancestors.length;
65
+ containerEndCallback === null || containerEndCallback === void 0 ? void 0 : containerEndCallback(ancestors[ancestors.length], token);
66
+ },
67
+ doctype: function (token) {
68
+ if (documentFactory && nodes.length === 0) {
69
+ var node = documentFactory(token);
70
+ pushNode(node);
71
+ ancestors[ancestors.length++] = node;
72
+ }
73
+ },
74
+ text: createDataTokenCallback(textFactory),
75
+ processingInstruction: createDataTokenCallback(processingInstructionFactory),
76
+ cdata: createDataTokenCallback(cdataFactory),
77
+ comment: createDataTokenCallback(commentFactory),
78
+ sourceEnd: sourceEndCallback,
79
+ reset: function () {
80
+ ancestors.length = 0;
81
+ resetCallback === null || resetCallback === void 0 ? void 0 : resetCallback();
82
+ },
83
+ };
84
+ }
@@ -0,0 +1,21 @@
1
+ import { IDomHandler, IParser, IParserOptions } from './parser-types';
2
+ import { Node } from './dom-types';
3
+ /**
4
+ * Creates a pre-configured HTML DOM parser that uses {@link domHandler}.
5
+ *
6
+ * @see {@link domHandler}
7
+ */
8
+ export declare function createHtmlDomParser(): IParser<Array<Node>>;
9
+ /**
10
+ * Creates a pre-configured HTML DOM parser.
11
+ *
12
+ * @template Node The type of object that describes a node in the DOM tree.
13
+ * @template ContainerNode The type of object that describes an element or a document in the DOM tree.
14
+ *
15
+ * @param handler The parsing handler.
16
+ * @param options Options that override the defaults.
17
+ *
18
+ * @see {@link domHandler}
19
+ * @see {@link htmlParserOptions}
20
+ */
21
+ export declare function createHtmlDomParser<Node, ContainerNode extends Node>(handler: IDomHandler<Node, ContainerNode>, options?: IParserOptions): IParser<Array<Node>>;
@@ -0,0 +1,7 @@
1
+ import { __assign } from "tslib";
2
+ import { createDomParser } from './createDomParser';
3
+ import { htmlParserOptions } from './createHtmlSaxParser';
4
+ import { domHandler } from './createXmlDomParser';
5
+ export function createHtmlDomParser(handler, options) {
6
+ return createDomParser(handler || domHandler, __assign(__assign({}, htmlParserOptions), options));
7
+ }
@@ -0,0 +1,29 @@
1
+ import { IParser, IParserOptions, ISaxHandler } from './parser-types';
2
+ /**
3
+ * Creates a pre-configured HTML SAX parser.
4
+ *
5
+ * @param handler The parsing handler.
6
+ * @param options Options that override the defaults.
7
+ */
8
+ export declare function createHtmlSaxParser(handler: ISaxHandler, options?: IParserOptions): IParser<void>;
9
+ /**
10
+ * The default HTML parser options:
11
+ * - CDATA sections and processing instructions are treated as comments;
12
+ * - Self-closing tags are treated as a start tags;
13
+ * - Tags like `p`, `li`, `td` and others follow implicit end rules, so `<p>foo<p>bar` is parsed as
14
+ * `<p>foo</p><p>bar</p>`;
15
+ * - Tag and attribute names are converted to lower case;
16
+ * - Legacy HTML entities are decoded in text and attribute values. To decode all known HTML entities use:
17
+ *
18
+ * ```ts
19
+ * import {decodeHtml} from 'speedy-entities/lib/full';
20
+ *
21
+ * createHtmlSaxParser({
22
+ * decodeText: decodeHtml,
23
+ * decodeAttribute: decodeHtml,
24
+ * });
25
+ * ```
26
+ *
27
+ * @see {@link https://github.com/smikhalevski/speedy-entities decodeHtml}
28
+ */
29
+ export declare const htmlParserOptions: IParserOptions;
@@ -0,0 +1,120 @@
1
+ import { __assign } from "tslib";
2
+ import { createSaxParser } from './createSaxParser';
3
+ import { decodeHtml } from 'speedy-entities';
4
+ /**
5
+ * Creates a pre-configured HTML SAX parser.
6
+ *
7
+ * @param handler The parsing handler.
8
+ * @param options Options that override the defaults.
9
+ */
10
+ export function createHtmlSaxParser(handler, options) {
11
+ return createSaxParser(handler, __assign(__assign({}, htmlParserOptions), options));
12
+ }
13
+ /**
14
+ * The default HTML parser options:
15
+ * - CDATA sections and processing instructions are treated as comments;
16
+ * - Self-closing tags are treated as a start tags;
17
+ * - Tags like `p`, `li`, `td` and others follow implicit end rules, so `<p>foo<p>bar` is parsed as
18
+ * `<p>foo</p><p>bar</p>`;
19
+ * - Tag and attribute names are converted to lower case;
20
+ * - Legacy HTML entities are decoded in text and attribute values. To decode all known HTML entities use:
21
+ *
22
+ * ```ts
23
+ * import {decodeHtml} from 'speedy-entities/lib/full';
24
+ *
25
+ * createHtmlSaxParser({
26
+ * decodeText: decodeHtml,
27
+ * decodeAttribute: decodeHtml,
28
+ * });
29
+ * ```
30
+ *
31
+ * @see {@link https://github.com/smikhalevski/speedy-entities decodeHtml}
32
+ */
33
+ export var htmlParserOptions = {
34
+ decodeText: decodeHtml,
35
+ decodeAttribute: decodeHtml,
36
+ renameTag: toLowerCase,
37
+ renameAttribute: toLowerCase,
38
+ checkCdataTag: checkCdataTag,
39
+ checkVoidTag: checkVoidTag,
40
+ endsAncestorAt: endsAncestorAt,
41
+ };
42
+ function toLowerCase(name) {
43
+ return name.toLowerCase();
44
+ }
45
+ function checkCdataTag(token) {
46
+ return cdataTags.has(token.name);
47
+ }
48
+ function checkVoidTag(token) {
49
+ return voidTags.has(token.name);
50
+ }
51
+ function endsAncestorAt(ancestors, token) {
52
+ var tagNames = implicitEndMap.get(token.name);
53
+ if (tagNames) {
54
+ for (var i = ancestors.length - 1; i >= 0; --i) {
55
+ if (tagNames.has(ancestors[i].name)) {
56
+ return i;
57
+ }
58
+ }
59
+ }
60
+ return -1;
61
+ }
62
+ var voidTags = toSet('area base basefont br col command embed frame hr img input isindex keygen link meta param source track wbr');
63
+ var cdataTags = toSet('script style textarea');
64
+ var formTags = toSet('input option optgroup select button datalist textarea');
65
+ var pTags = toSet('p');
66
+ var implicitEndMap = toMap({
67
+ tr: toSet('tr th td'),
68
+ th: toSet('th'),
69
+ td: toSet('thead th td'),
70
+ body: toSet('head link script'),
71
+ li: toSet('li'),
72
+ option: toSet('option'),
73
+ optgroup: toSet('optgroup option'),
74
+ dd: toSet('dt dd'),
75
+ dt: toSet('dt dd'),
76
+ select: formTags,
77
+ input: formTags,
78
+ output: formTags,
79
+ button: formTags,
80
+ datalist: formTags,
81
+ textarea: formTags,
82
+ p: pTags,
83
+ h1: pTags,
84
+ h2: pTags,
85
+ h3: pTags,
86
+ h4: pTags,
87
+ h5: pTags,
88
+ h6: pTags,
89
+ address: pTags,
90
+ article: pTags,
91
+ aside: pTags,
92
+ blockquote: pTags,
93
+ details: pTags,
94
+ div: pTags,
95
+ dl: pTags,
96
+ fieldset: pTags,
97
+ figcaption: pTags,
98
+ figure: pTags,
99
+ footer: pTags,
100
+ form: pTags,
101
+ header: pTags,
102
+ hr: pTags,
103
+ main: pTags,
104
+ nav: pTags,
105
+ ol: pTags,
106
+ pre: pTags,
107
+ section: pTags,
108
+ table: pTags,
109
+ ul: pTags,
110
+ rt: toSet('rt rp'),
111
+ rp: toSet('rt rp'),
112
+ tbody: toSet('thead tbody'),
113
+ tfoot: toSet('thead tbody'),
114
+ });
115
+ function toSet(data) {
116
+ return new Set(data.split(' '));
117
+ }
118
+ function toMap(rec) {
119
+ return new Map(Object.entries(rec));
120
+ }
@@ -0,0 +1,8 @@
1
+ import { IParser, IParserOptions, ISaxHandler } from './parser-types';
2
+ /**
3
+ * Creates a new stateful SAX parser.
4
+ *
5
+ * @param handler The parsing handler.
6
+ * @param options Parsing options.
7
+ */
8
+ export declare function createSaxParser(handler: ISaxHandler, options?: IParserOptions): IParser<void>;