xml-twig 1.3.12 → 1.3.14
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +10 -11
- package/doc/twig.md +4 -16
- package/package.json +3 -3
- package/twig.js +19 -13
package/README.md
CHANGED
|
@@ -18,7 +18,7 @@ When you need to read a XML file, then you have two principles:
|
|
|
18
18
|
This module tries to combine both principles. The XML document can be read in chunks and within a chunk you have all the nice features and functions you know from a DOM based parser.
|
|
19
19
|
|
|
20
20
|
## Dependencies
|
|
21
|
-
XML documents are read either with [sax](https://www.npmjs.com/package/sax), [node-expat](https://www.npmjs.com/package/node-expat) or [saxophone](https://www.npmjs.com/package/saxophone) parser. More parser may be added in future releases. By default the `sax` parser is used.
|
|
21
|
+
XML documents are read either with [sax](https://www.npmjs.com/package/sax), [node-expat](https://www.npmjs.com/package/node-expat) or [saxophone](https://www.npmjs.com/package/saxophone) parser. More parser may be added in future releases. By default the `sax` parser is used. However, I clearly recommend using the `node-expat` parser. All other parsers I tested, are not compliant to XML standards.
|
|
22
22
|
|
|
23
23
|
**NOTE: The `node-expat` and `saxophone` modules are not automatically installed with this module. Install the parser by yourself, if you like to use it**
|
|
24
24
|
|
|
@@ -33,7 +33,7 @@ npm install node-expat
|
|
|
33
33
|
npm install saxophone
|
|
34
34
|
|
|
35
35
|
```
|
|
36
|
-
In my tests I parsed a 900 MB big XML file, the `node-expat` is faster than `sax` (node-expat: around 2:30 Minutes, sax: around 3:40 Minutes). However, you may run into problems when you try to install the `node-expat` parser. That's the reason why `node-expat` parser is not installed automatically. `saxophone` is even a little faster (around 2:10 Minutes) than `node-expat
|
|
36
|
+
In my tests I parsed a 900 MB big XML file, the `node-expat` is faster than `sax` (node-expat: around 2:30 Minutes, sax: around 3:40 Minutes). However, you may run into problems when you try to install the `node-expat` parser. That's the reason why `node-expat` parser is not installed automatically. `saxophone` is even a little faster (around 2:10 Minutes) than `node-expat`.
|
|
37
37
|
|
|
38
38
|
## How to use it
|
|
39
39
|
|
|
@@ -57,19 +57,12 @@ API Documentation: see [Twig](./doc/twig.md)
|
|
|
57
57
|
fs.createReadStream(`${__dirname}/bookstore.xml`).pipe(parser)
|
|
58
58
|
|
|
59
59
|
// Output -> <bookstore> finished after 48 lines
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
// Or use a Parser object instead of a Stream - works only with 'expat'!
|
|
63
|
-
const parser = twig.createParser({ tag: twig.Root, function: rootHandler }, { method: 'expat' })
|
|
64
|
-
parser.write('<html><head><title>Hello World</title></head><body><p>Foobar</p></body></html>');
|
|
65
|
-
|
|
66
|
-
// Output -> xml finished after 1 lines
|
|
67
60
|
```
|
|
68
61
|
|
|
69
62
|
If you prefer [events](https://nodejs.org/api/events.html), then use `event` property instead of `function` in handler declaration:
|
|
70
63
|
|
|
71
64
|
```js
|
|
72
|
-
const parser = twig.createParser({ tag: twig.Root, event: 'rootElement' }, { method: '
|
|
65
|
+
const parser = twig.createParser({ tag: twig.Root, event: 'rootElement' }, { method: 'expat' })
|
|
73
66
|
fs.createReadStream(`${__dirname}/bookstore.xml`).pipe(parser)
|
|
74
67
|
|
|
75
68
|
parser.on('rootElement', (elt) => {
|
|
@@ -96,6 +89,7 @@ API Documentation: see [Twig](./doc/twig.md)
|
|
|
96
89
|
{ tag: 'book', function: bookHandler },
|
|
97
90
|
{ tag: 'ebook', function: bookHandler }
|
|
98
91
|
];
|
|
92
|
+
handle_book = [ { tag: ['book', 'ebook'], function: bookHandler } ];
|
|
99
93
|
handle_book = { tag: /book$/, function: bookHandler };
|
|
100
94
|
handle_book = [{
|
|
101
95
|
tag: function(name, elt) { return name.endsWith('book') },
|
|
@@ -392,7 +386,12 @@ This `xml-twig` module focus on reading a XML files. In principle it would be po
|
|
|
392
386
|
|
|
393
387
|
Accessing Twig-Elements by [XML-Path](https://www.w3.org/TR/xpath/) language is not supported. One reason it, the `Twig` class models more an [Element](https://www.w3schools.com/xml/xml_elements.asp) rather than a [Node](https://www.w3schools.com/xml/dom_nodes.asp) which would be more generic.
|
|
394
388
|
|
|
395
|
-
|
|
389
|
+
As already mentioned above, I recommend the `expat` parser. The other parser may work for your purpose, however they have several limitations and bugs:
|
|
390
|
+
|
|
391
|
+
- `sax` and `saxophone` do not support UTF-16 encoding. I did not test other encodings, because [W3C Recommendations](https://www.w3.org/TR/xml/#charencoding) defines only UTF-8 and UTF-16 as required
|
|
392
|
+
- `sax` misinterpret character entities
|
|
393
|
+
- `saxophone` fails on `<!DOCTYPE>` element
|
|
394
|
+
- Properties `currentLine` and `currentColumn` are not available with `saxophone`
|
|
396
395
|
|
|
397
396
|
|
|
398
397
|
|
package/doc/twig.md
CHANGED
|
@@ -53,11 +53,11 @@
|
|
|
53
53
|
Element can be specified as string, Regular Expression, custom function, <code>Twig.Root</code> or <code>Twig.Any</code><br>
|
|
54
54
|
You can specify a <code>function</code> or a <code>event</code> name</p>
|
|
55
55
|
</dd>
|
|
56
|
-
<dt><a href="#HandlerCondition">HandlerCondition</a> : <code>string</code> | <code>RegExp</code> | <code><a href="#HandlerConditionFilter">HandlerConditionFilter</a></code> | <code><a href="#Root">Root</a></code> | <code><a href="#Any">Any</a></code
|
|
56
|
+
<dt><a href="#HandlerCondition">HandlerCondition</a> : <code>string</code> | <code>Array.<string></code> | <code>RegExp</code> | <code><a href="#HandlerConditionFilter">HandlerConditionFilter</a></code> | <code><a href="#Root">Root</a></code> | <code><a href="#Any">Any</a></code></dt>
|
|
57
57
|
<dd><p>Condition to specify when handler shall be called<br> </p>
|
|
58
58
|
<ul>
|
|
59
|
-
<li>If <code>undefined</code>, then all elements are returned.<br> </li>
|
|
60
59
|
<li>If <code>string</code> then the element name must be equal to the string</li>
|
|
60
|
+
<li>If <code>string[]</code> then the element name must be included in string array</li>
|
|
61
61
|
<li>If <code>RegExp</code> then the element name must match the Regular Expression</li>
|
|
62
62
|
<li>If <a href="#HandlerConditionFilter">HandlerConditionFilter</a> then function must return <code>true</code></li>
|
|
63
63
|
<li>Use <code>Twig.Root</code> to call the handler on root element, i.e. when the end of document is reached</li>
|
|
@@ -1510,8 +1510,8 @@ Reference to handler functions for Twig objects.<br>
|
|
|
1510
1510
|
|
|
1511
1511
|
<a name="HandlerCondition"></a>
|
|
1512
1512
|
|
|
1513
|
-
## HandlerCondition : <code>string</code> \| <code>RegExp</code> \| [<code>HandlerConditionFilter</code>](#HandlerConditionFilter) \| [<code>Root</code>](#Root) \| [<code>Any</code>](#Any)
|
|
1514
|
-
Condition to specify when handler shall be called<br>
|
|
1515
|
-
If `undefined`, then all elements are returned.<br>
|
|
1516
|
-
If `string` then the element name must be equal to the string
|
|
1517
|
-
If `RegExp` then the element name must match the Regular Expression
|
|
1518
|
-
If [HandlerConditionFilter](#HandlerConditionFilter) then function must return `true`
|
|
1519
|
-
Use `Twig.Root` to call the handler on root element, i.e. when the end of document is reached
|
|
1520
|
-
Use `Twig.Any` to call the handler on every element
|
|
1513
|
+
## HandlerCondition : <code>string</code> \| <code>Array.<string></code> \| <code>RegExp</code> \| [<code>HandlerConditionFilter</code>](#HandlerConditionFilter) \| [<code>Root</code>](#Root) \| [<code>Any</code>](#Any)
|
|
1514
|
+
Condition to specify when handler shall be called<br>
|
|
1521
|
-
If `string` then the element name must be equal to the string
|
|
1522
|
-
If `string[]` then the element name must be included in string array
|
|
1523
|
-
If `RegExp` then the element name must match the Regular Expression
|
|
1524
|
-
If [HandlerConditionFilter](#HandlerConditionFilter) then function must return `true`
|
|
1525
|
-
Use `Twig.Root` to call the handler on root element, i.e. when the end of document is reached
|
|
1526
|
-
Use `Twig.Any` to call the handler on every element
|
|
1527
1515
|
|
|
1528
1516
|
**Kind**: global typedef
|
|
1529
1517
|
<a name="HandlerFunction"></a>
|
package/package.json
CHANGED
|
@@ -5,15 +5,15 @@
|
|
|
5
5
|
},
|
|
6
6
|
"name": "xml-twig",
|
|
7
7
|
"description": "Node module for processing huge XML documents in tree mode",
|
|
8
|
-
"version": "1.3.
|
|
8
|
+
"version": "1.3.14",
|
|
9
9
|
"main": "twig.js",
|
|
10
10
|
"directories": {
|
|
11
11
|
"doc": "doc"
|
|
12
12
|
},
|
|
13
13
|
"devDependencies": {
|
|
14
|
-
"jsdoc-to-markdown": "^8.0.
|
|
14
|
+
"jsdoc-to-markdown": "^8.0.1",
|
|
15
15
|
"luxon": "^3.4.4",
|
|
16
|
-
"node-expat": "^2.4.
|
|
16
|
+
"node-expat": "^2.4.1",
|
|
17
17
|
"saxophone": "^0.8.0"
|
|
18
18
|
},
|
|
19
19
|
"scripts": {
|
package/twig.js
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
const SAX = 'sax';
|
|
2
|
-
const EXPAT = 'expat';
|
|
2
|
+
const EXPAT = ['expat', 'node-expat'];
|
|
3
3
|
const SAXOPHONE = 'saxophone';
|
|
4
4
|
|
|
5
5
|
let tree;
|
|
@@ -39,7 +39,8 @@ let current;
|
|
|
39
39
|
* Other parsers I had a look at:
|
|
40
40
|
* {@link https://www.npmjs.com/package/sax-wasm|sax-wasm}: not a 'stream.Writable'
|
|
41
41
|
* {@link https://www.npmjs.com/package/@rubensworks/saxes|saxes}: not a 'stream.Writable'
|
|
42
|
-
* {@link https://www.npmjs.com/package/node-xml-stream|node-xml-stream}:
|
|
42
|
+
* {@link https://www.npmjs.com/package/node-xml-stream|node-xml-stream}: Lacks comment and processinginstruction and maybe self closing tags
|
|
43
|
+
* {@link https://www.npmjs.com/package/node-xml-stream-parser|node-xml-stream-parser}: Lacks comment and processinginstruction
|
|
43
44
|
* {@link https://www.npmjs.com/package/saxes-stream|saxes-stream}: not a 'stream.Writable'
|
|
44
45
|
* {@link https://www.npmjs.com/package/xml-streamer|xml-streamer}: based on 'node-expat', does not add any benefit
|
|
45
46
|
*/
|
|
@@ -86,13 +87,13 @@ const Any = new AnyHandler();
|
|
|
86
87
|
|
|
87
88
|
/**
|
|
88
89
|
* Condition to specify when handler shall be called<br>
|
|
89
|
-
* - If `undefined`, then all elements are returned.<br>
|
|
90
90
|
* - If `string` then the element name must be equal to the string
|
|
91
|
+
* - If `string[]` then the element name must be included in string array
|
|
91
92
|
* - If `RegExp` then the element name must match the Regular Expression
|
|
92
93
|
* - If [HandlerConditionFilter](#HandlerConditionFilter) then function must return `true`
|
|
93
94
|
* - Use `Twig.Root` to call the handler on root element, i.e. when the end of document is reached
|
|
94
95
|
* - Use `Twig.Any` to call the handler on every element
|
|
95
|
-
* @typedef {string|RegExp|HandlerConditionFilter|Root|Any
|
|
96
|
+
* @typedef {string|string[]|RegExp|HandlerConditionFilter|Root|Any} HandlerCondition
|
|
96
97
|
*/
|
|
97
98
|
|
|
98
99
|
/**
|
|
@@ -133,7 +134,6 @@ const Any = new AnyHandler();
|
|
|
133
134
|
* @returns {external:sax|external:node-expat|external:saxophone} The parser Object
|
|
134
135
|
*/
|
|
135
136
|
|
|
136
|
-
|
|
137
137
|
/**
|
|
138
138
|
* Create a new Twig parser
|
|
139
139
|
* @param {TwigHandler|TwigHandler[]} handler - Object or array of element specification and function to handle elements
|
|
@@ -146,12 +146,11 @@ function createParser(handler, options = {}) {
|
|
|
146
146
|
let parser;
|
|
147
147
|
let namespaces = {};
|
|
148
148
|
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
}
|
|
149
|
+
const handlerCheck = Array.isArray(handler) ? handler : [handler];
|
|
150
|
+
if (handlerCheck.find(x => x.tag === undefined) != null || handlerCheck.find(x => x.tag.length == 0) != null)
|
|
151
|
+
throw new ReferenceError(`'handler.tag' is not defined`);
|
|
152
|
+
if (options.partial && handlerCheck.find(x => x.tag instanceof AnyHandler) != null)
|
|
153
|
+
console.warn(`Using option '{ partial: true }' and handler '{ tag: Any, function: ${any.function.toString()} }' does not make much sense`);
|
|
155
154
|
|
|
156
155
|
// `parser.on("...", err => {...}` does not work, because I need access to 'this'
|
|
157
156
|
if (options.method === SAX) {
|
|
@@ -224,7 +223,7 @@ function createParser(handler, options = {}) {
|
|
|
224
223
|
parser.emit("close");
|
|
225
224
|
});
|
|
226
225
|
|
|
227
|
-
} else if (options.method
|
|
226
|
+
} else if (EXPAT.includes(options.method)) {
|
|
228
227
|
parser = require("node-expat").createParser();
|
|
229
228
|
Object.defineProperty(parser, 'currentLine', {
|
|
230
229
|
enumerable: true,
|
|
@@ -416,6 +415,9 @@ function onStart(binds, node, attrs) {
|
|
|
416
415
|
if (typeof hndl.tag === 'string' && name === hndl.tag) {
|
|
417
416
|
elt.pin();
|
|
418
417
|
break;
|
|
418
|
+
} else if (typeof Array.isArray(hndl.tag) && hndl.tag.includes(name)) {
|
|
419
|
+
elt.pin();
|
|
420
|
+
break;
|
|
419
421
|
} else if (hndl.tag instanceof RegExp && hndl.tag.test(name)) {
|
|
420
422
|
elt.pin();
|
|
421
423
|
break;
|
|
@@ -429,7 +431,7 @@ function onStart(binds, node, attrs) {
|
|
|
429
431
|
}
|
|
430
432
|
|
|
431
433
|
if (options.xmlns) {
|
|
432
|
-
if (
|
|
434
|
+
if (EXPAT.concat(SAXOPHONE).includes(options.method)) {
|
|
433
435
|
for (let key of Object.keys(attrs).filter(x => x.startsWith('xmlns:')))
|
|
434
436
|
namespaces[key.split(':')[1]] = attrs[key];
|
|
435
437
|
}
|
|
@@ -469,6 +471,10 @@ function onClose(handler, options, name) {
|
|
|
469
471
|
if (typeof hndl.function === 'function') hndl.function(tree);
|
|
470
472
|
if (typeof hndl.event === 'string') parser.emit(hndl.event, tree);
|
|
471
473
|
purge = false;
|
|
474
|
+
} else if (typeof Array.isArray(hndl.tag) && hndl.tag.includes(name)) {
|
|
475
|
+
if (typeof hndl.function === 'function') hndl.function(current ?? tree);
|
|
476
|
+
if (typeof hndl.event === 'string') parser.emit(hndl.event, current ?? tree);
|
|
477
|
+
purge = false;
|
|
472
478
|
} else if (typeof hndl.tag === 'string' && name === hndl.tag) {
|
|
473
479
|
if (typeof hndl.function === 'function') hndl.function(current ?? tree);
|
|
474
480
|
if (typeof hndl.event === 'string') parser.emit(hndl.event, current ?? tree);
|