xml-twig 1.7.13 → 1.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. package/README.md +18 -18
  2. package/doc/twig.md +67 -3
  3. package/package.json +6 -7
  4. package/twig.js +41 -4
package/README.md CHANGED
@@ -4,27 +4,29 @@ Node module for processing huge XML documents in tree mode
4
4
  Inspired by Perl module [XML::Twig](https://metacpan.org/pod/XML::Twig)
5
5
 
6
6
 
7
- ## When should I use this, motivation of this module
8
- When you need to read a XML file, then you have two principles:
7
+ ## When to Use This Module and Its Motivation
8
+ When you need to read an XML file, there are two primary approaches:
9
9
 
10
- * The **Document Object Model (DOM)** style. These parser read the entire XML document into memory. Usually they provide easy methods to navigate in the document tree or make modifications.
10
+ 1. **The Document Object Model (DOM) Style**
11
11
 
12
- DOM parsers are perfect for rather small files, for example configuration files or (X-)HTML pages. However, for bigger XML files you may run into memory limits. When you parse a XML-File as DOM, then the footprint in RAM can be easily 10-20 times the size of the raw XML-String. If the XML-File is greater than [Buffer.constants.MAX_STRING_LENGTH](https://nodejs.org/api/buffer.html#bufferconstantsmax_string_length) (typically 512 MiB), then a DOM parser may throw error "Cannot create a string longer than 0x1fffffe8 characters".
12
+ These parsers read the entire XML document into memory. They usually provide convenient methods for navigating the document tree or making modifications. DOM parsers are ideal for smaller files, such as configuration files or (X-)HTML pages. However, for larger XML files, you may run into memory limitations. Parsing an XML file using the DOM method can cause memory usage to increase by 10-20 times the size of the raw XML string. If the XML file exceeds the size of [Buffer.constants.MAX_STRING_LENGTH](https://nodejs.org/api/buffer.html#bufferconstantsmax_string_length) (typically 512 MB), the DOM parser may throw an error: "Cannot create a string longer than 0x1fffffe8 characters."
13
13
 
14
- * The **stream** or **event** based parsers. These parser read the XML file "line by line". The biggest advantage of such a parser is, there is no limit in the size of the XML file. You can read XML files having a size of many terabytes, because you read always just a single node.
14
+ 1. **Stream or Event-Based Parsers**
15
15
 
16
- The backside: By default you cannot navigate in the document tree, you know only the current node.
16
+ These parsers read the XML file "line by line" or node by node. The main advantage of this approach is that there is no size limitation for the XML file. You can read XML files of several terabytes because only a single node is read into memory at a time.
17
+
18
+ The downside is that, by default, you cannot navigate the document tree - you can only access the current node.
17
19
 
18
- This module tries to combine both principles. The XML document can be read in chunks and within a chunk you have all the nice features and functions you know from a DOM based parser.
20
+ This module aims to combine both approaches. It reads the XML document in chunks, and within each chunk, you can utilize the familiar features and functions of a DOM-based parser.
19
21
 
20
22
  ## Dependencies
21
- XML documents are read either with [sax](https://www.npmjs.com/package/sax) or [node-expat](https://www.npmjs.com/package/node-expat) parser. More parser may be added in future releases. By default the `sax` parser is used. However, I clearly recommend using the `node-expat` parser. All other parsers I tested, are not compliant to XML standards.
23
+ XML documents are parsed using either the [sax](https://www.npmjs.com/package/sax) or [node-expat](https://www.npmjs.com/package/node-expat) parser. parsers. Additional parsers may be added in future releases. By default, the `sax` parser is used. However, I strongly recommend using the `node-expat` parser, as other parsers I tested are not fully compliant with XML standards.
22
24
 
23
- **NOTE: The `node-expat` module is not automatically installed with this module. Install the parser by yourself, if you like to use it**
25
+ **NOTE: The `node-expat` module is not automatically installed with this module. If you wish to use it, you must install it manually.**
24
26
 
25
27
  ## Installation
26
28
 
27
- Install module like any other node module and optionally `node-expat`:
29
+ To install the module, use the standard Node.js installation process. Optionally, you can also install the `node-expat` parser:
28
30
  ```bash
29
31
  npm install xml-twig
30
32
 
@@ -32,7 +34,7 @@ npm install xml-twig
32
34
  npm install node-expat
33
35
 
34
36
  ```
35
- In my tests I parsed a 900 MB big XML file, the `node-expat` is faster than `sax` (node-expat: around 2:30 Minutes, sax: around 3:40 Minutes). However, you may run into problems when you try to install the `node-expat` parser. That's the reason why `node-expat` parser is not installed automatically.
37
+ In my tests, I parsed a 900 MB XML file, and the `node-expat`t parser was faster than `sax` (`node-expat`: around 2:30 minutes, `sax`: around 3:40 minutes). However, you may encounter issues when installing the `node-expat` parser, which is why it's not installed automatically.
36
38
 
37
39
  ## How to use it
38
40
 
@@ -72,10 +74,9 @@ API Documentation: see [Twig](./doc/twig.md)
72
74
 
73
75
  - **Read XML Document in chucks**
74
76
 
75
- The key feature of this module is to read and process XML files in chunks. You need to create handler functions for elements you like to process.<br>
76
- The most notable difference to other parsers is the `purge()` and `purgeUpTo()` method. The parser reads the element and you decide how long you need to keep it in the memory.
77
- In many cases you will purge it immediately after you have used it but in some cases you may keep the element for later use. The parser knows the element position in the XML-Tree.
77
+ The key feature of this module is the ability to read and process XML files in chunks. You need to define handler functions for the elements you want to process.
78
78
 
79
+ A major difference compared to other parsers is the `purge()` and `purgeUpTo()` methods. The parser reads an element, and you decide how long to keep it in memory. In many cases, you will purge the element immediately after processing it, but in some situations, you might want to retain it for later use. The parser keeps track of the element’s position within the XML tree.
79
80
 
80
81
  ```js
81
82
  function bookHandler(elt, parserObj) {
@@ -146,10 +147,6 @@ API Documentation: see [Twig](./doc/twig.md)
146
147
 
147
148
  ```
148
149
 
149
- Be aware if you run methods like `elt.followingSibling()`, `elt.descendant()`, `elt.next()`, etc. on the current element. Such calls return empty result, because following element are not yet read from the XML file. You must navigate to an earlier element, e.g.<br>
150
- `elt.root().children()[0].followingSibling()`
151
-
152
-
153
150
  - **Read only parts from XML Document**
154
151
 
155
152
  If you like to read only certain elements, use option `partial: true`. The `root` element is always read.
@@ -294,6 +291,9 @@ Here are some examples the get attribute and values:
294
291
 
295
292
  `.writer(indented|xw)` - **XMLWriter**: Returns a [XMLWriter](https://www.npmjs.com/package/xml-writer) object you can use to print the currently loaded XML tree.<br>Instead of providing an indented parameter (`true`, `false` or indent character) you can also provide an `XMLWriter` object which adds more flexibility.
296
293
 
294
+ Be aware if you call methods like `elt.followingSibling()`, `elt.descendant()`, `elt.next()`, etc. on the current element, they will return empty results. This is because the following elements have not yet been read from the XML file. To navigate to an earlier element, you can use a method like:<br>
295
+ `elt.root().children()[0].followingSibling()`
296
+
297
297
  **condition** Parameter
298
298
 
299
299
  You can specify condition on above methods. You can filter elements by following conditions:
package/doc/twig.md CHANGED
@@ -127,6 +127,7 @@ You can specify a <code>function</code> or a <code>event</code> name</p>
127
127
  * [.children](#Twig+children) : [<code>Array.&lt;Twig&gt;</code>](#Twig) ℗
128
128
  * [.parent](#Twig+parent) : [<code>Twig</code>](#Twig) \| <code>undefined</code> ℗
129
129
  * [.pinned](#Twig+pinned) : <code>boolean</code> ℗
130
+ * [.trim](#Twig+trim) : <code>boolean</code> ℗
130
131
  * [.purge](#Twig+purge)
131
132
  * [.purgeUpTo](#Twig+purgeUpTo)
132
133
  * [.escapeEntity](#Twig+escapeEntity)
@@ -156,6 +157,8 @@ You can specify a <code>function</code> or a <code>event</code> name</p>
156
157
  * [.parent](#Twig+parent) ⇒ [<code>Twig</code>](#Twig)
157
158
  * [.self](#Twig+self) ⇒ [<code>Twig</code>](#Twig)
158
159
  * [.children](#Twig+children) ⇒ [<code>Array.&lt;Twig&gt;</code>](#Twig)
160
+ * [.firstChild](#Twig+firstChild) ⇒ [<code>Twig</code>](#Twig)
161
+ * [.lastChild](#Twig+lastChild) ⇒ [<code>Twig</code>](#Twig)
159
162
  * [.next](#Twig+next) ⇒ [<code>Twig</code>](#Twig)
160
163
  * [.previous](#Twig+previous) ⇒ [<code>Twig</code>](#Twig)
161
164
  * [.first](#Twig+first) ⇒ [<code>Twig</code>](#Twig)
@@ -238,6 +241,13 @@ The parent object. Undefined on root element
238
241
  ### twig.pinned : <code>boolean</code> ℗
239
242
  Determines whether twig is needed in partial load
240
243
 
244
+ **Kind**: instance property of [<code>Twig</code>](#Twig)
245
+ **Access**: private
246
+ <a name="Twig+trim"></a>
247
+
248
+ ### twig.trim : <code>boolean</code> ℗
249
+ Determines whether text is trimmed
250
+
241
251
  **Kind**: instance property of [<code>Twig</code>](#Twig)
242
252
  **Access**: private
243
253
  <a name="Twig+purge"></a>
@@ -327,7 +337,7 @@ Returns the name of the element. Synonym for `twig.name`
327
337
  <a name="Twig+text"></a>
328
338
 
329
339
  ### twig.text ⇒ <code>string</code>
330
- The text of the element. No matter if given as text or CDATA entity
340
+ The text of the element. No matter if given as text or CDATA entity.
331
341
 
332
342
  **Kind**: instance property of [<code>Twig</code>](#Twig)
333
343
  **Returns**: <code>string</code> - Element text or empty string
@@ -492,6 +502,28 @@ All children, optionally matching `condition` of the current element or empty ar
492
502
  | --- | --- | --- |
493
503
  | condition | [<code>ElementCondition</code>](#ElementCondition) | Optional condition |
494
504
 
505
+ <a name="Twig+firstChild"></a>
506
+
507
+ ### twig.firstChild ⇒ [<code>Twig</code>](#Twig)
508
+ The first matching child, optionally matching `condition` of the current element or null
509
+
510
+ **Kind**: instance property of [<code>Twig</code>](#Twig)
511
+
512
+ | Param | Type | Description |
513
+ | --- | --- | --- |
514
+ | condition | [<code>ElementCondition</code>](#ElementCondition) | Optional condition |
515
+
516
+ <a name="Twig+lastChild"></a>
517
+
518
+ ### twig.lastChild ⇒ [<code>Twig</code>](#Twig)
519
+ The last matching child, optionally matching `condition` of the current element or null
520
+
521
+ **Kind**: instance property of [<code>Twig</code>](#Twig)
522
+
523
+ | Param | Type | Description |
524
+ | --- | --- | --- |
525
+ | condition | [<code>ElementCondition</code>](#ElementCondition) | Optional condition |
526
+
495
527
  <a name="Twig+next"></a>
496
528
 
497
529
  ### twig.next ⇒ [<code>Twig</code>](#Twig)
@@ -761,6 +793,7 @@ Common function to filter Twig element
761
793
  * [.children](#Twig+children) : [<code>Array.&lt;Twig&gt;</code>](#Twig) ℗
762
794
  * [.parent](#Twig+parent) : [<code>Twig</code>](#Twig) \| <code>undefined</code> ℗
763
795
  * [.pinned](#Twig+pinned) : <code>boolean</code> ℗
796
+ * [.trim](#Twig+trim) : <code>boolean</code> ℗
764
797
  * [.purge](#Twig+purge)
765
798
  * [.purgeUpTo](#Twig+purgeUpTo)
766
799
  * [.escapeEntity](#Twig+escapeEntity)
@@ -790,6 +823,8 @@ Common function to filter Twig element
790
823
  * [.parent](#Twig+parent) ⇒ [<code>Twig</code>](#Twig)
791
824
  * [.self](#Twig+self) ⇒ [<code>Twig</code>](#Twig)
792
825
  * [.children](#Twig+children) ⇒ [<code>Array.&lt;Twig&gt;</code>](#Twig)
826
+ * [.firstChild](#Twig+firstChild) ⇒ [<code>Twig</code>](#Twig)
827
+ * [.lastChild](#Twig+lastChild) ⇒ [<code>Twig</code>](#Twig)
793
828
  * [.next](#Twig+next) ⇒ [<code>Twig</code>](#Twig)
794
829
  * [.previous](#Twig+previous) ⇒ [<code>Twig</code>](#Twig)
795
830
  * [.first](#Twig+first) ⇒ [<code>Twig</code>](#Twig)
@@ -872,6 +907,13 @@ The parent object. Undefined on root element
872
907
  ### twig.pinned : <code>boolean</code> ℗
873
908
  Determines whether twig is needed in partial load
874
909
 
910
+ **Kind**: instance property of [<code>Twig</code>](#Twig)
911
+ **Access**: private
912
+ <a name="Twig+trim"></a>
913
+
914
+ ### twig.trim : <code>boolean</code> ℗
915
+ Determines whether text is trimmed
916
+
875
917
  **Kind**: instance property of [<code>Twig</code>](#Twig)
876
918
  **Access**: private
877
919
  <a name="Twig+purge"></a>
@@ -961,7 +1003,7 @@ Returns the name of the element. Synonym for `twig.name`
961
1003
  <a name="Twig+text"></a>
962
1004
 
963
1005
  ### twig.text ⇒ <code>string</code>
964
- The text of the element. No matter if given as text or CDATA entity
1006
+ The text of the element. No matter if given as text or CDATA entity.
965
1007
 
966
1008
  **Kind**: instance property of [<code>Twig</code>](#Twig)
967
1009
  **Returns**: <code>string</code> - Element text or empty string
@@ -1126,6 +1168,28 @@ All children, optionally matching `condition` of the current element or empty ar
1126
1168
  | --- | --- | --- |
1127
1169
  | condition | [<code>ElementCondition</code>](#ElementCondition) | Optional condition |
1128
1170
 
1171
+ <a name="Twig+firstChild"></a>
1172
+
1173
+ ### twig.firstChild ⇒ [<code>Twig</code>](#Twig)
1174
+ The first matching child, optionally matching `condition` of the current element or null
1175
+
1176
+ **Kind**: instance property of [<code>Twig</code>](#Twig)
1177
+
1178
+ | Param | Type | Description |
1179
+ | --- | --- | --- |
1180
+ | condition | [<code>ElementCondition</code>](#ElementCondition) | Optional condition |
1181
+
1182
+ <a name="Twig+lastChild"></a>
1183
+
1184
+ ### twig.lastChild ⇒ [<code>Twig</code>](#Twig)
1185
+ The last matching child, optionally matching `condition` of the current element or null
1186
+
1187
+ **Kind**: instance property of [<code>Twig</code>](#Twig)
1188
+
1189
+ | Param | Type | Description |
1190
+ | --- | --- | --- |
1191
+ | condition | [<code>ElementCondition</code>](#ElementCondition) | Optional condition |
1192
+
1129
1193
  <a name="Twig+next"></a>
1130
1194
 
1131
1195
  ### twig.next ⇒ [<code>Twig</code>](#Twig)
@@ -1446,7 +1510,7 @@ Generic error for unsupported condition
1446
1510
 
1447
1511
  ## SAX
1448
1512
  **Kind**: global constant
1449
- **Version:**: 1.7.12
1513
+ **Version:**: 1.9.1
1450
1514
  **Author:**: Wernfried Domscheit
1451
1515
  **Copyright:**: Copyright (c) 2025 Wernfried Domscheit. All rights reserved.
1452
1516
  **Website:**: https://www.npmjs.com/package/xml-twig
package/package.json CHANGED
@@ -5,7 +5,7 @@
5
5
  },
6
6
  "name": "xml-twig",
7
7
  "description": "Node module for processing huge XML documents in tree mode",
8
- "version": "1.7.13",
8
+ "version": "1.9.1",
9
9
  "main": "twig.js",
10
10
  "directories": {
11
11
  "doc": "doc"
@@ -15,16 +15,15 @@
15
15
  "doc/*.md"
16
16
  ],
17
17
  "devDependencies": {
18
- "jsdoc-to-markdown": "^9.0.0",
19
- "luxon": "^3.5.0",
18
+ "jsdoc-to-markdown": "^9.1.1",
19
+ "luxon": "^3.6.1",
20
20
  "node-expat": "^2.4.1"
21
21
  },
22
22
  "scripts": {
23
23
  "test": "node demo.js",
24
- "preversion": "jsdoc2md --private twig.js > doc/twig.md",
25
- "postversion": "sed -i -e \"s/@version: .*/@version: %npm_package_version%/\" twig.js",
26
- "prepack": "unix2dos twig.js doc/twig.md",
27
- "prepare": "git commit -a -m \"Updated doc and version\""
24
+ "postversion": "sed -bi -e \"s/@version: .*/@version: %npm_package_version%/\" twig.js",
25
+ "prepare": "jsdoc2md --EOL win32 --private twig.js > doc/twig.md",
26
+ "postpack": "git commit -a -m \"Updated doc and version\""
28
27
  },
29
28
  "repository": {
30
29
  "type": "git",
package/twig.js CHANGED
@@ -1,5 +1,5 @@
1
1
  /**
2
- * @version: 1.7.13
2
+ * @version: 1.9.1
3
3
  * @author: Wernfried Domscheit
4
4
  * @copyright: Copyright (c) 2025 Wernfried Domscheit. All rights reserved.
5
5
  * @website: https://www.npmjs.com/package/xml-twig
@@ -278,6 +278,12 @@ function createParser(handler, options = {}) {
278
278
  enumerable: true
279
279
  });
280
280
 
281
+ Object.defineProperty(parser, 'trimText', {
282
+ value: options.trim,
283
+ writable: false,
284
+ enumerable: true
285
+ });
286
+
281
287
  if (options.file != null) {
282
288
  Object.defineProperty(parser, 'file', {
283
289
  value: options.file,
@@ -289,7 +295,7 @@ function createParser(handler, options = {}) {
289
295
  // Common events
290
296
  parser.on('text', function (str) {
291
297
  if (parser.twig.current === null) return;
292
- parser.twig.current.text = options.trim ? str.trim() : str;
298
+ parser.twig.current.text = str;
293
299
  });
294
300
 
295
301
  parser.on("comment", function (str) {
@@ -513,6 +519,12 @@ class Twig {
513
519
  */
514
520
  #pinned = false;
515
521
 
522
+ /**
523
+ * Determines whether text is trimmed
524
+ * @type {boolean}
525
+ */
526
+ #trim = true;
527
+
516
528
  /**
517
529
  * Create a new Twig object
518
530
  * @param {Parser} parser - The main parser object
@@ -525,6 +537,7 @@ class Twig {
525
537
  if (index === undefined)
526
538
  parser.twig.current = this;
527
539
 
540
+ this.#trim = parser.trimText;
528
541
  if (name === null) {
529
542
  // Root element not available yet
530
543
  parser.twig.tree = this;
@@ -698,11 +711,15 @@ class Twig {
698
711
  }
699
712
 
700
713
  /**
701
- * The text of the element. No matter if given as text or CDATA entity
714
+ * The text of the element. No matter if given as text or CDATA entity.
715
+ * If option `trim: true`, then whitespace from both ends of the string are removed
702
716
  * @returns {string} Element text or empty string
703
717
  */
704
718
  get text() {
705
- return this.#text ?? '';
719
+ if (this.#text === null)
720
+ return ''
721
+ else
722
+ return this.#trim ? this.#text.trim() : this.#text;
706
723
  }
707
724
 
708
725
  /**
@@ -958,6 +975,26 @@ class Twig {
958
975
  return this.filterElements(this.#children, condition);
959
976
  };
960
977
 
978
+ /**
979
+ * The first matching child, optionally matching `condition` of the current element or null
980
+ * @param {ElementCondition} condition - Optional condition
981
+ * @returns {?Twig}
982
+ */
983
+ firstChild = function (condition) {
984
+ let _children = this.children(condition);
985
+ return _children.length == 0 ? null : _children[0];
986
+ };
987
+
988
+ /**
989
+ * The last matching child, optionally matching `condition` of the current element or null
990
+ * @param {ElementCondition} condition - Optional condition
991
+ * @returns {?Twig}
992
+ */
993
+ lastChild = function (condition) {
994
+ let _children = this.children(condition);
995
+ return _children.length == 0 ? null : _children[_children.length - 1];
996
+ };
997
+
961
998
  /**
962
999
  * Returns the next matching element.
963
1000
  * @param {ElementCondition} condition - Optional condition