xml-twig 1.3.11 → 1.3.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +10 -4
- package/doc/twig.md +20 -16
- package/package.json +3 -3
- package/samples/memory-test.js +1 -0
- package/twig.js +24 -16
package/README.md
CHANGED
|
@@ -18,7 +18,7 @@ When you need to read a XML file, then you have two principles:
|
|
|
18
18
|
This module tries to combine both principles. The XML document can be read in chunks and within a chunk you have all the nice features and functions you know from a DOM based parser.
|
|
19
19
|
|
|
20
20
|
## Dependencies
|
|
21
|
-
XML documents are read either with [sax](https://www.npmjs.com/package/sax), [node-expat](https://www.npmjs.com/package/node-expat) or [saxophone](https://www.npmjs.com/package/saxophone) parser. More parser may be added in future releases. By default the `sax` parser is used.
|
|
21
|
+
XML documents are read either with [sax](https://www.npmjs.com/package/sax), [node-expat](https://www.npmjs.com/package/node-expat) or [saxophone](https://www.npmjs.com/package/saxophone) parser. More parser may be added in future releases. By default the `sax` parser is used. However, I clearly recommend using the `node-expat` parser. All other parsers I tested, are not compliant to XML standards.
|
|
22
22
|
|
|
23
23
|
**NOTE: The `node-expat` and `saxophone` modules are not automatically installed with this module. Install the parser by yourself, if you like to use it**
|
|
24
24
|
|
|
@@ -33,7 +33,7 @@ npm install node-expat
|
|
|
33
33
|
npm install saxophone
|
|
34
34
|
|
|
35
35
|
```
|
|
36
|
-
In my tests I parsed a 900 MB big XML file, the `node-expat` is faster than `sax` (node-expat: around 2:30 Minutes, sax: around 3:40 Minutes). However, you may run into problems when you try to install the `node-expat` parser. That's the reason why `node-expat` parser is not installed automatically. `saxophone` is even a little faster (around 2:10 Minutes) than `node-expat
|
|
36
|
+
In my tests I parsed a 900 MB big XML file, the `node-expat` is faster than `sax` (node-expat: around 2:30 Minutes, sax: around 3:40 Minutes). However, you may run into problems when you try to install the `node-expat` parser. That's the reason why `node-expat` parser is not installed automatically. `saxophone` is even a little faster (around 2:10 Minutes) than `node-expat`.
|
|
37
37
|
|
|
38
38
|
## How to use it
|
|
39
39
|
|
|
@@ -69,7 +69,7 @@ API Documentation: see [Twig](./doc/twig.md)
|
|
|
69
69
|
If you prefer [events](https://nodejs.org/api/events.html), then use `event` property instead of `function` in handler declaration:
|
|
70
70
|
|
|
71
71
|
```js
|
|
72
|
-
const parser = twig.createParser({ tag: twig.Root, event: 'rootElement' }, { method: '
|
|
72
|
+
const parser = twig.createParser({ tag: twig.Root, event: 'rootElement' }, { method: 'expat' })
|
|
73
73
|
fs.createReadStream(`${__dirname}/bookstore.xml`).pipe(parser)
|
|
74
74
|
|
|
75
75
|
parser.on('rootElement', (elt) => {
|
|
@@ -96,6 +96,7 @@ API Documentation: see [Twig](./doc/twig.md)
|
|
|
96
96
|
{ tag: 'book', function: bookHandler },
|
|
97
97
|
{ tag: 'ebook', function: bookHandler }
|
|
98
98
|
];
|
|
99
|
+
handle_book = [ { tag: ['book', 'ebook'], function: bookHandler } ];
|
|
99
100
|
handle_book = { tag: /book$/, function: bookHandler };
|
|
100
101
|
handle_book = [{
|
|
101
102
|
tag: function(name, elt) { return name.endsWith('book') },
|
|
@@ -392,7 +393,12 @@ This `xml-twig` module focus on reading a XML files. In principle it would be po
|
|
|
392
393
|
|
|
393
394
|
Accessing Twig-Elements by [XML-Path](https://www.w3.org/TR/xpath/) language is not supported. One reason it, the `Twig` class models more an [Element](https://www.w3schools.com/xml/xml_elements.asp) rather than a [Node](https://www.w3schools.com/xml/dom_nodes.asp) which would be more generic.
|
|
394
395
|
|
|
395
|
-
|
|
396
|
+
As already mentioned above, I recommend the `expat` parser. The other parser may work for your purpose, however they have several limitations and bugs:
|
|
397
|
+
|
|
398
|
+
- `sax` and `saxophone` do not support UTF-16 encoding. I did not test other encodings, because [W3C Recommendations](https://www.w3.org/TR/xml/#charencoding) defines only UTF-8 and UTF-16 as required
|
|
399
|
+
- `sax` misinterpret character entities
|
|
400
|
+
- `saxophone` fails on `<!DOCTYPE>` element
|
|
401
|
+
- Properties `currentLine` and `currentColumn` are not available with `saxophone`
|
|
396
402
|
|
|
397
403
|
|
|
398
404
|
|
package/doc/twig.md
CHANGED
|
@@ -53,11 +53,11 @@
|
|
|
53
53
|
Element can be specified as string, Regular Expression, custom function, <code>Twig.Root</code> or <code>Twig.Any</code><br>
|
|
54
54
|
You can specify a <code>function</code> or a <code>event</code> name</p>
|
|
55
55
|
</dd>
|
|
56
|
-
<dt><a href="#HandlerCondition">HandlerCondition</a> : <code>string</code> | <code>RegExp</code> | <code><a href="#HandlerConditionFilter">HandlerConditionFilter</a></code> | <code><a href="#Root">Root</a></code> | <code><a href="#Any">Any</a></code
|
|
56
|
+
<dt><a href="#HandlerCondition">HandlerCondition</a> : <code>string</code> | <code>Array.<string></code> | <code>RegExp</code> | <code><a href="#HandlerConditionFilter">HandlerConditionFilter</a></code> | <code><a href="#Root">Root</a></code> | <code><a href="#Any">Any</a></code></dt>
|
|
57
57
|
<dd><p>Condition to specify when handler shall be called<br> </p>
|
|
58
58
|
<ul>
|
|
59
|
-
<li>If <code>undefined</code>, then all elements are returned.<br> </li>
|
|
60
59
|
<li>If <code>string</code> then the element name must be equal to the string</li>
|
|
60
|
+
<li>If <code>string[]</code> then the element name must be included in string array</li>
|
|
61
61
|
<li>If <code>RegExp</code> then the element name must match the Regular Expression</li>
|
|
62
62
|
<li>If <a href="#HandlerConditionFilter">HandlerConditionFilter</a> then function must return <code>true</code></li>
|
|
63
63
|
<li>Use <code>Twig.Root</code> to call the handler on root element, i.e. when the end of document is reached</li>
|
|
@@ -141,6 +141,7 @@ You can specify a <code>function</code> or a <code>event</code> name</p>
|
|
|
141
141
|
* [.pinned](#Twig+pinned) ⇒ <code>boolean</code>
|
|
142
142
|
* [.close](#Twig+close)
|
|
143
143
|
* [.debug](#Twig+debug) ⇒ <code>string</code>
|
|
144
|
+
* [.toString](#Twig+toString) ⇒ <code>string</code>
|
|
144
145
|
* [.addChild](#Twig+addChild) ℗
|
|
145
146
|
* [.writer](#Twig+writer) ⇒ [<code>XMLWriter</code>](https://www.npmjs.com/package/xml-writer)
|
|
146
147
|
* [.attr](#Twig+attr) ⇒ <code>string</code> \| <code>number</code> \| <code>object</code>
|
|
@@ -360,6 +361,13 @@ XML-Twig for dummies :-)
|
|
|
360
361
|
|
|
361
362
|
**Kind**: instance property of [<code>Twig</code>](#Twig)
|
|
362
363
|
**Returns**: <code>string</code> - The XML-Tree which is currently available in RAM - no valid XML Structure
|
|
364
|
+
<a name="Twig+toString"></a>
|
|
365
|
+
|
|
366
|
+
### twig.toString ⇒ <code>string</code>
|
|
367
|
+
Returns XML string of the element
|
|
368
|
+
|
|
369
|
+
**Kind**: instance property of [<code>Twig</code>](#Twig)
|
|
370
|
+
**Returns**: <code>string</code> - The XML-Element as string
|
|
363
371
|
<a name="Twig+addChild"></a>
|
|
364
372
|
|
|
365
373
|
### twig.addChild ℗
|
|
@@ -758,6 +766,7 @@ Common function to filter Twig element
|
|
|
758
766
|
* [.pinned](#Twig+pinned) ⇒ <code>boolean</code>
|
|
759
767
|
* [.close](#Twig+close)
|
|
760
768
|
* [.debug](#Twig+debug) ⇒ <code>string</code>
|
|
769
|
+
* [.toString](#Twig+toString) ⇒ <code>string</code>
|
|
761
770
|
* [.addChild](#Twig+addChild) ℗
|
|
762
771
|
* [.writer](#Twig+writer) ⇒ [<code>XMLWriter</code>](https://www.npmjs.com/package/xml-writer)
|
|
763
772
|
* [.attr](#Twig+attr) ⇒ <code>string</code> \| <code>number</code> \| <code>object</code>
|
|
@@ -977,6 +986,13 @@ XML-Twig for dummies :-)
|
|
|
977
986
|
|
|
978
987
|
**Kind**: instance property of [<code>Twig</code>](#Twig)
|
|
979
988
|
**Returns**: <code>string</code> - The XML-Tree which is currently available in RAM - no valid XML Structure
|
|
989
|
+
<a name="Twig+toString"></a>
|
|
990
|
+
|
|
991
|
+
### twig.toString ⇒ <code>string</code>
|
|
992
|
+
Returns XML string of the element
|
|
993
|
+
|
|
994
|
+
**Kind**: instance property of [<code>Twig</code>](#Twig)
|
|
995
|
+
**Returns**: <code>string</code> - The XML-Element as string
|
|
980
996
|
<a name="Twig+addChild"></a>
|
|
981
997
|
|
|
982
998
|
### twig.addChild ℗
|
|
@@ -1494,8 +1510,8 @@ Reference to handler functions for Twig objects.<br>
|
|
|
1494
1510
|
|
|
1495
1511
|
<a name="HandlerCondition"></a>
|
|
1496
1512
|
|
|
1497
|
-
## HandlerCondition : <code>string</code> \| <code>RegExp</code> \| [<code>HandlerConditionFilter</code>](#HandlerConditionFilter) \| [<code>Root</code>](#Root) \| [<code>Any</code>](#Any)
|
|
1498
|
-
Condition to specify when handler shall be called<br>
|
|
1499
|
-
If `undefined`, then all elements are returned.<br>
|
|
1500
|
-
If `string` then the element name must be equal to the string
|
|
1501
|
-
If `RegExp` then the element name must match the Regular Expression
|
|
1502
|
-
If [HandlerConditionFilter](#HandlerConditionFilter) then function must return `true`
|
|
1503
|
-
Use `Twig.Root` to call the handler on root element, i.e. when the end of document is reached
|
|
1504
|
-
Use `Twig.Any` to call the handler on every element
|
|
1513
|
+
## HandlerCondition : <code>string</code> \| <code>Array.<string></code> \| <code>RegExp</code> \| [<code>HandlerConditionFilter</code>](#HandlerConditionFilter) \| [<code>Root</code>](#Root) \| [<code>Any</code>](#Any)
|
|
1514
|
+
Condition to specify when handler shall be called<br>
|
|
1505
|
-
If `string` then the element name must be equal to the string
|
|
1506
|
-
If `string[]` then the element name must be included in string array
|
|
1507
|
-
If `RegExp` then the element name must match the Regular Expression
|
|
1508
|
-
If [HandlerConditionFilter](#HandlerConditionFilter) then function must return `true`
|
|
1509
|
-
Use `Twig.Root` to call the handler on root element, i.e. when the end of document is reached
|
|
1510
|
-
Use `Twig.Any` to call the handler on every element
|
|
1511
1515
|
|
|
1512
1516
|
**Kind**: global typedef
|
|
1513
1517
|
<a name="HandlerFunction"></a>
|
package/package.json
CHANGED
|
@@ -5,15 +5,15 @@
|
|
|
5
5
|
},
|
|
6
6
|
"name": "xml-twig",
|
|
7
7
|
"description": "Node module for processing huge XML documents in tree mode",
|
|
8
|
-
"version": "1.3.
|
|
8
|
+
"version": "1.3.13",
|
|
9
9
|
"main": "twig.js",
|
|
10
10
|
"directories": {
|
|
11
11
|
"doc": "doc"
|
|
12
12
|
},
|
|
13
13
|
"devDependencies": {
|
|
14
|
-
"jsdoc-to-markdown": "^8.0.
|
|
14
|
+
"jsdoc-to-markdown": "^8.0.1",
|
|
15
15
|
"luxon": "^3.4.4",
|
|
16
|
-
"node-expat": "^2.4.
|
|
16
|
+
"node-expat": "^2.4.1",
|
|
17
17
|
"saxophone": "^0.8.0"
|
|
18
18
|
},
|
|
19
19
|
"scripts": {
|
package/samples/memory-test.js
CHANGED
|
@@ -6,6 +6,7 @@ let Entry = 0;
|
|
|
6
6
|
|
|
7
7
|
let parser = twig.createParser([{ tag: 'Entry', function: EntryHandler }], { method: 'expat' })
|
|
8
8
|
// http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/data/SwissProt/SwissProt.xml.gz
|
|
9
|
+
// For more files see http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/www/repository.html
|
|
9
10
|
let reader = fs.createReadStream(`SwissProt.xml`);
|
|
10
11
|
reader.pipe(parser);
|
|
11
12
|
|
package/twig.js
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
const SAX = 'sax';
|
|
2
|
-
const EXPAT = 'expat';
|
|
2
|
+
const EXPAT = ['expat', 'node-expat'];
|
|
3
3
|
const SAXOPHONE = 'saxophone';
|
|
4
4
|
|
|
5
5
|
let tree;
|
|
@@ -35,10 +35,13 @@ let current;
|
|
|
35
35
|
* @see {@link https://www.npmjs.com/package/libxmljs|libxmljs}
|
|
36
36
|
*/
|
|
37
37
|
|
|
38
|
-
|
|
39
|
-
*
|
|
40
|
-
* @
|
|
41
|
-
*
|
|
38
|
+
/*
|
|
39
|
+
* Other parsers I had a look at:
|
|
40
|
+
* {@link https://www.npmjs.com/package/sax-wasm|sax-wasm}: not a 'stream.Writable'
|
|
41
|
+
* {@link https://www.npmjs.com/package/@rubensworks/saxes|saxes}: not a 'stream.Writable'
|
|
42
|
+
* {@link https://www.npmjs.com/package/node-xml-stream|node-xml-stream}: should work, but not implemented
|
|
43
|
+
* {@link https://www.npmjs.com/package/saxes-stream|saxes-stream}: not a 'stream.Writable'
|
|
44
|
+
* {@link https://www.npmjs.com/package/xml-streamer|xml-streamer}: based on 'node-expat', does not add any benefit
|
|
42
45
|
*/
|
|
43
46
|
|
|
44
47
|
|
|
@@ -83,13 +86,13 @@ const Any = new AnyHandler();
|
|
|
83
86
|
|
|
84
87
|
/**
|
|
85
88
|
* Condition to specify when handler shall be called<br>
|
|
86
|
-
* - If `undefined`, then all elements are returned.<br>
|
|
87
89
|
* - If `string` then the element name must be equal to the string
|
|
90
|
+
* - If `string[]` then the element name must be included in string array
|
|
88
91
|
* - If `RegExp` then the element name must match the Regular Expression
|
|
89
92
|
* - If [HandlerConditionFilter](#HandlerConditionFilter) then function must return `true`
|
|
90
93
|
* - Use `Twig.Root` to call the handler on root element, i.e. when the end of document is reached
|
|
91
94
|
* - Use `Twig.Any` to call the handler on every element
|
|
92
|
-
* @typedef {string|RegExp|HandlerConditionFilter|Root|Any
|
|
95
|
+
* @typedef {string|string[]|RegExp|HandlerConditionFilter|Root|Any} HandlerCondition
|
|
93
96
|
*/
|
|
94
97
|
|
|
95
98
|
/**
|
|
@@ -130,7 +133,6 @@ const Any = new AnyHandler();
|
|
|
130
133
|
* @returns {external:sax|external:node-expat|external:saxophone} The parser Object
|
|
131
134
|
*/
|
|
132
135
|
|
|
133
|
-
|
|
134
136
|
/**
|
|
135
137
|
* Create a new Twig parser
|
|
136
138
|
* @param {TwigHandler|TwigHandler[]} handler - Object or array of element specification and function to handle elements
|
|
@@ -143,12 +145,11 @@ function createParser(handler, options = {}) {
|
|
|
143
145
|
let parser;
|
|
144
146
|
let namespaces = {};
|
|
145
147
|
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
}
|
|
148
|
+
const handlerCheck = Array.isArray(handler) ? handler : [handler];
|
|
149
|
+
if (handlerCheck.find(x => x.tag === undefined) != null || handlerCheck.find(x => x.tag.length == 0) != null)
|
|
150
|
+
throw new ReferenceError(`'handler.tag' is not defined`);
|
|
151
|
+
if (options.partial && handlerCheck.find(x => x.tag instanceof AnyHandler) != null)
|
|
152
|
+
console.warn(`Using option '{ partial: true }' and handler '{ tag: Any, function: ${any.function.toString()} }' does not make much sense`);
|
|
152
153
|
|
|
153
154
|
// `parser.on("...", err => {...}` does not work, because I need access to 'this'
|
|
154
155
|
if (options.method === SAX) {
|
|
@@ -221,7 +222,7 @@ function createParser(handler, options = {}) {
|
|
|
221
222
|
parser.emit("close");
|
|
222
223
|
});
|
|
223
224
|
|
|
224
|
-
} else if (options.method
|
|
225
|
+
} else if (EXPAT.includes(options.method)) {
|
|
225
226
|
parser = require("node-expat").createParser();
|
|
226
227
|
Object.defineProperty(parser, 'currentLine', {
|
|
227
228
|
enumerable: true,
|
|
@@ -413,6 +414,9 @@ function onStart(binds, node, attrs) {
|
|
|
413
414
|
if (typeof hndl.tag === 'string' && name === hndl.tag) {
|
|
414
415
|
elt.pin();
|
|
415
416
|
break;
|
|
417
|
+
} else if (typeof Array.isArray(hndl.tag) && hndl.tag.includes(name)) {
|
|
418
|
+
elt.pin();
|
|
419
|
+
break;
|
|
416
420
|
} else if (hndl.tag instanceof RegExp && hndl.tag.test(name)) {
|
|
417
421
|
elt.pin();
|
|
418
422
|
break;
|
|
@@ -426,7 +430,7 @@ function onStart(binds, node, attrs) {
|
|
|
426
430
|
}
|
|
427
431
|
|
|
428
432
|
if (options.xmlns) {
|
|
429
|
-
if (
|
|
433
|
+
if (EXPAT.concat(SAXOPHONE).includes(options.method)) {
|
|
430
434
|
for (let key of Object.keys(attrs).filter(x => x.startsWith('xmlns:')))
|
|
431
435
|
namespaces[key.split(':')[1]] = attrs[key];
|
|
432
436
|
}
|
|
@@ -466,6 +470,10 @@ function onClose(handler, options, name) {
|
|
|
466
470
|
if (typeof hndl.function === 'function') hndl.function(tree);
|
|
467
471
|
if (typeof hndl.event === 'string') parser.emit(hndl.event, tree);
|
|
468
472
|
purge = false;
|
|
473
|
+
} else if (typeof Array.isArray(hndl.tag) && hndl.tag.includes(name)) {
|
|
474
|
+
if (typeof hndl.function === 'function') hndl.function(current ?? tree);
|
|
475
|
+
if (typeof hndl.event === 'string') parser.emit(hndl.event, current ?? tree);
|
|
476
|
+
purge = false;
|
|
469
477
|
} else if (typeof hndl.tag === 'string' && name === hndl.tag) {
|
|
470
478
|
if (typeof hndl.function === 'function') hndl.function(current ?? tree);
|
|
471
479
|
if (typeof hndl.event === 'string') parser.emit(hndl.event, current ?? tree);
|