xml-twig 1.3.12 → 1.3.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. package/README.md +10 -4
  2. package/doc/twig.md +4 -16
  3. package/package.json +3 -3
  4. package/twig.js +17 -12
package/README.md CHANGED
@@ -18,7 +18,7 @@ When you need to read a XML file, then you have two principles:
18
18
  This module tries to combine both principles. The XML document can be read in chunks and within a chunk you have all the nice features and functions you know from a DOM based parser.
19
19
 
20
20
  ## Dependencies
21
- XML documents are read either with [sax](https://www.npmjs.com/package/sax), [node-expat](https://www.npmjs.com/package/node-expat) or [saxophone](https://www.npmjs.com/package/saxophone) parser. More parser may be added in future releases. By default the `sax` parser is used.
21
+ XML documents are read either with [sax](https://www.npmjs.com/package/sax), [node-expat](https://www.npmjs.com/package/node-expat) or [saxophone](https://www.npmjs.com/package/saxophone) parser. More parser may be added in future releases. By default the `sax` parser is used. However, I clearly recommend using the `node-expat` parser. All other parsers I tested, are not compliant to XML standards.
22
22
 
23
23
  **NOTE: The `node-expat` and `saxophone` modules are not automatically installed with this module. Install the parser by yourself, if you like to use it**
24
24
 
@@ -33,7 +33,7 @@ npm install node-expat
33
33
  npm install saxophone
34
34
 
35
35
  ```
36
- In my tests I parsed a 900 MB big XML file, the `node-expat` is faster than `sax` (node-expat: around 2:30 Minutes, sax: around 3:40 Minutes). However, you may run into problems when you try to install the `node-expat` parser. That's the reason why `node-expat` parser is not installed automatically. `saxophone` is even a little faster (around 2:10 Minutes) than `node-expat`, however `saxophone` lacks some functions and is not fully compliant to XML standards.
36
+ In my tests I parsed a 900 MB big XML file, the `node-expat` is faster than `sax` (node-expat: around 2:30 Minutes, sax: around 3:40 Minutes). However, you may run into problems when you try to install the `node-expat` parser. That's the reason why `node-expat` parser is not installed automatically. `saxophone` is even a little faster (around 2:10 Minutes) than `node-expat`.
37
37
 
38
38
  ## How to use it
39
39
 
@@ -69,7 +69,7 @@ API Documentation: see [Twig](./doc/twig.md)
69
69
  If you prefer [events](https://nodejs.org/api/events.html), then use `event` property instead of `function` in handler declaration:
70
70
 
71
71
  ```js
72
- const parser = twig.createParser({ tag: twig.Root, event: 'rootElement' }, { method: 'sax' })
72
+ const parser = twig.createParser({ tag: twig.Root, event: 'rootElement' }, { method: 'expat' })
73
73
  fs.createReadStream(`${__dirname}/bookstore.xml`).pipe(parser)
74
74
 
75
75
  parser.on('rootElement', (elt) => {
@@ -96,6 +96,7 @@ API Documentation: see [Twig](./doc/twig.md)
96
96
  { tag: 'book', function: bookHandler },
97
97
  { tag: 'ebook', function: bookHandler }
98
98
  ];
99
+ handle_book = [ { tag: ['book', 'ebook'], function: bookHandler } ];
99
100
  handle_book = { tag: /book$/, function: bookHandler };
100
101
  handle_book = [{
101
102
  tag: function(name, elt) { return name.endsWith('book') },
@@ -392,7 +393,12 @@ This `xml-twig` module focus on reading a XML files. In principle it would be po
392
393
 
393
394
  Accessing Twig-Elements by [XML-Path](https://www.w3.org/TR/xpath/) language is not supported. One reason it, the `Twig` class models more an [Element](https://www.w3schools.com/xml/xml_elements.asp) rather than a [Node](https://www.w3schools.com/xml/dom_nodes.asp) which would be more generic.
394
395
 
395
- Despite [W3C Recommendations](https://www.w3.org/TR/xml/#charencoding) ("All XML processors MUST be able to read entities in both the UTF-8 and UTF-16 encodings"), the `sax` and `saxophone` parsers do not support UTF-16 encodings. When you have a XML-File encoded in UTF-16, then you must use the `expat` parser.
396
+ As already mentioned above, I recommend the `expat` parser. The other parser may work for your purpose, however they have several limitations and bugs:
397
+
398
+ - `sax` and `saxophone` do not support UTF-16 encoding. I did not test other encodings, because [W3C Recommendations](https://www.w3.org/TR/xml/#charencoding) defines only UTF-8 and UTF-16 as required
399
+ - `sax` misinterpret character entities
400
+ - `saxophone` fails on `<!DOCTYPE>` element
401
+ - Properties `currentLine` and `currentColumn` are not available with `saxophone`
396
402
 
397
403
 
398
404
 
package/doc/twig.md CHANGED
@@ -53,11 +53,11 @@
53
53
  Element can be specified as string, Regular Expression, custom function, <code>Twig.Root</code> or <code>Twig.Any</code><br>
54
54
  You can specify a <code>function</code> or a <code>event</code> name</p>
55
55
  </dd>
56
- <dt><a href="#HandlerCondition">HandlerCondition</a> : <code>string</code> | <code>RegExp</code> | <code><a href="#HandlerConditionFilter">HandlerConditionFilter</a></code> | <code><a href="#Root">Root</a></code> | <code><a href="#Any">Any</a></code> | <code>undefined</code></dt>
56
+ <dt><a href="#HandlerCondition">HandlerCondition</a> : <code>string</code> | <code>Array.&lt;string&gt;</code> | <code>RegExp</code> | <code><a href="#HandlerConditionFilter">HandlerConditionFilter</a></code> | <code><a href="#Root">Root</a></code> | <code><a href="#Any">Any</a></code></dt>
57
57
  <dd><p>Condition to specify when handler shall be called<br> </p>
58
58
  <ul>
59
- <li>If <code>undefined</code>, then all elements are returned.<br> </li>
60
59
  <li>If <code>string</code> then the element name must be equal to the string</li>
60
+ <li>If <code>string[]</code> then the element name must be included in string array</li>
61
61
  <li>If <code>RegExp</code> then the element name must match the Regular Expression</li>
62
62
  <li>If <a href="#HandlerConditionFilter">HandlerConditionFilter</a> then function must return <code>true</code></li>
63
63
  <li>Use <code>Twig.Root</code> to call the handler on root element, i.e. when the end of document is reached</li>
@@ -1510,8 +1510,8 @@ Reference to handler functions for Twig objects.<br>
1510
1510
 
1511
1511
  <a name="HandlerCondition"></a>
1512
1512
 
1513
- ## HandlerCondition : <code>string</code> \| <code>RegExp</code> \| [<code>HandlerConditionFilter</code>](#HandlerConditionFilter) \| [<code>Root</code>](#Root) \| [<code>Any</code>](#Any) \| <code>undefined</code>
1514
- Condition to specify when handler shall be called<br>
1515
- If `undefined`, then all elements are returned.<br>
1516
- If `string` then the element name must be equal to the string
1517
- If `RegExp` then the element name must match the Regular Expression
1518
- If [HandlerConditionFilter](#HandlerConditionFilter) then function must return `true`
1519
- Use `Twig.Root` to call the handler on root element, i.e. when the end of document is reached
1520
- Use `Twig.Any` to call the handler on every element
1513
+ ## HandlerCondition : <code>string</code> \| <code>Array.&lt;string&gt;</code> \| <code>RegExp</code> \| [<code>HandlerConditionFilter</code>](#HandlerConditionFilter) \| [<code>Root</code>](#Root) \| [<code>Any</code>](#Any)
1514
+ Condition to specify when handler shall be called<br>
1521
- If `string` then the element name must be equal to the string
1522
- If `string[]` then the element name must be included in string array
1523
- If `RegExp` then the element name must match the Regular Expression
1524
- If [HandlerConditionFilter](#HandlerConditionFilter) then function must return `true`
1525
- Use `Twig.Root` to call the handler on root element, i.e. when the end of document is reached
1526
- Use `Twig.Any` to call the handler on every element
1527
1515
 
1528
1516
  **Kind**: global typedef
1529
1517
  <a name="HandlerFunction"></a>
package/package.json CHANGED
@@ -5,15 +5,15 @@
5
5
  },
6
6
  "name": "xml-twig",
7
7
  "description": "Node module for processing huge XML documents in tree mode",
8
- "version": "1.3.12",
8
+ "version": "1.3.13",
9
9
  "main": "twig.js",
10
10
  "directories": {
11
11
  "doc": "doc"
12
12
  },
13
13
  "devDependencies": {
14
- "jsdoc-to-markdown": "^8.0.0",
14
+ "jsdoc-to-markdown": "^8.0.1",
15
15
  "luxon": "^3.4.4",
16
- "node-expat": "^2.4.0",
16
+ "node-expat": "^2.4.1",
17
17
  "saxophone": "^0.8.0"
18
18
  },
19
19
  "scripts": {
package/twig.js CHANGED
@@ -1,5 +1,5 @@
1
1
  const SAX = 'sax';
2
- const EXPAT = 'expat';
2
+ const EXPAT = ['expat', 'node-expat'];
3
3
  const SAXOPHONE = 'saxophone';
4
4
 
5
5
  let tree;
@@ -86,13 +86,13 @@ const Any = new AnyHandler();
86
86
 
87
87
  /**
88
88
  * Condition to specify when handler shall be called<br>
89
- * - If `undefined`, then all elements are returned.<br>
90
89
  * - If `string` then the element name must be equal to the string
90
+ * - If `string[]` then the element name must be included in string array
91
91
  * - If `RegExp` then the element name must match the Regular Expression
92
92
  * - If [HandlerConditionFilter](#HandlerConditionFilter) then function must return `true`
93
93
  * - Use `Twig.Root` to call the handler on root element, i.e. when the end of document is reached
94
94
  * - Use `Twig.Any` to call the handler on every element
95
- * @typedef {string|RegExp|HandlerConditionFilter|Root|Any|undefined} HandlerCondition
95
+ * @typedef {string|string[]|RegExp|HandlerConditionFilter|Root|Any} HandlerCondition
96
96
  */
97
97
 
98
98
  /**
@@ -133,7 +133,6 @@ const Any = new AnyHandler();
133
133
  * @returns {external:sax|external:node-expat|external:saxophone} The parser Object
134
134
  */
135
135
 
136
-
137
136
  /**
138
137
  * Create a new Twig parser
139
138
  * @param {TwigHandler|TwigHandler[]} handler - Object or array of element specification and function to handle elements
@@ -146,12 +145,11 @@ function createParser(handler, options = {}) {
146
145
  let parser;
147
146
  let namespaces = {};
148
147
 
149
- if (options.partial) {
150
- const handle1 = Array.isArray(handler) ? handler : [handler];
151
- let any = handle1.find(x => x.tag instanceof AnyHandler);
152
- if (any !== undefined)
153
- console.warn(`Using option '{ partial: true }' and handler '{ tag: Any, function: ${any.function.toString()} }' does not make much sense`);
154
- }
148
+ const handlerCheck = Array.isArray(handler) ? handler : [handler];
149
+ if (handlerCheck.find(x => x.tag === undefined) != null || handlerCheck.find(x => x.tag.length == 0) != null)
150
+ throw new ReferenceError(`'handler.tag' is not defined`);
151
+ if (options.partial && handlerCheck.find(x => x.tag instanceof AnyHandler) != null)
152
+ console.warn(`Using option '{ partial: true }' and handler '{ tag: Any, function: ${any.function.toString()} }' does not make much sense`);
155
153
 
156
154
  // `parser.on("...", err => {...}` does not work, because I need access to 'this'
157
155
  if (options.method === SAX) {
@@ -224,7 +222,7 @@ function createParser(handler, options = {}) {
224
222
  parser.emit("close");
225
223
  });
226
224
 
227
- } else if (options.method === EXPAT) {
225
+ } else if (EXPAT.includes(options.method)) {
228
226
  parser = require("node-expat").createParser();
229
227
  Object.defineProperty(parser, 'currentLine', {
230
228
  enumerable: true,
@@ -416,6 +414,9 @@ function onStart(binds, node, attrs) {
416
414
  if (typeof hndl.tag === 'string' && name === hndl.tag) {
417
415
  elt.pin();
418
416
  break;
417
+ } else if (typeof Array.isArray(hndl.tag) && hndl.tag.includes(name)) {
418
+ elt.pin();
419
+ break;
419
420
  } else if (hndl.tag instanceof RegExp && hndl.tag.test(name)) {
420
421
  elt.pin();
421
422
  break;
@@ -429,7 +430,7 @@ function onStart(binds, node, attrs) {
429
430
  }
430
431
 
431
432
  if (options.xmlns) {
432
- if ([EXPAT, SAXOPHONE].includes(options.method)) {
433
+ if (EXPAT.concat(SAXOPHONE).includes(options.method)) {
433
434
  for (let key of Object.keys(attrs).filter(x => x.startsWith('xmlns:')))
434
435
  namespaces[key.split(':')[1]] = attrs[key];
435
436
  }
@@ -469,6 +470,10 @@ function onClose(handler, options, name) {
469
470
  if (typeof hndl.function === 'function') hndl.function(tree);
470
471
  if (typeof hndl.event === 'string') parser.emit(hndl.event, tree);
471
472
  purge = false;
473
+ } else if (typeof Array.isArray(hndl.tag) && hndl.tag.includes(name)) {
474
+ if (typeof hndl.function === 'function') hndl.function(current ?? tree);
475
+ if (typeof hndl.event === 'string') parser.emit(hndl.event, current ?? tree);
476
+ purge = false;
472
477
  } else if (typeof hndl.tag === 'string' && name === hndl.tag) {
473
478
  if (typeof hndl.function === 'function') hndl.function(current ?? tree);
474
479
  if (typeof hndl.event === 'string') parser.emit(hndl.event, current ?? tree);