xml-twig 1.7.13 → 1.9.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +18 -18
- package/doc/twig.md +67 -3
- package/package.json +6 -7
- package/twig.js +41 -4
package/README.md
CHANGED
|
@@ -4,27 +4,29 @@ Node module for processing huge XML documents in tree mode
|
|
|
4
4
|
Inspired by Perl module [XML::Twig](https://metacpan.org/pod/XML::Twig)
|
|
5
5
|
|
|
6
6
|
|
|
7
|
-
## When
|
|
8
|
-
When you need to read
|
|
7
|
+
## When to Use This Module and Its Motivation
|
|
8
|
+
When you need to read an XML file, there are two primary approaches:
|
|
9
9
|
|
|
10
|
-
|
|
10
|
+
1. **The Document Object Model (DOM) Style**
|
|
11
11
|
|
|
12
|
-
DOM parsers are
|
|
12
|
+
These parsers read the entire XML document into memory. They usually provide convenient methods for navigating the document tree or making modifications. DOM parsers are ideal for smaller files, such as configuration files or (X-)HTML pages. However, for larger XML files, you may run into memory limitations. Parsing an XML file using the DOM method can cause memory usage to increase by 10-20 times the size of the raw XML string. If the XML file exceeds the size of [Buffer.constants.MAX_STRING_LENGTH](https://nodejs.org/api/buffer.html#bufferconstantsmax_string_length) (typically 512 MB), the DOM parser may throw an error: "Cannot create a string longer than 0x1fffffe8 characters."
|
|
13
13
|
|
|
14
|
-
|
|
14
|
+
1. **Stream or Event-Based Parsers**
|
|
15
15
|
|
|
16
|
-
The
|
|
16
|
+
These parsers read the XML file "line by line" or node by node. The main advantage of this approach is that there is no size limitation for the XML file. You can read XML files of several terabytes because only a single node is read into memory at a time.
|
|
17
|
+
|
|
18
|
+
The downside is that, by default, you cannot navigate the document tree - you can only access the current node.
|
|
17
19
|
|
|
18
|
-
This module
|
|
20
|
+
This module aims to combine both approaches. It reads the XML document in chunks, and within each chunk, you can utilize the familiar features and functions of a DOM-based parser.
|
|
19
21
|
|
|
20
22
|
## Dependencies
|
|
21
|
-
XML documents are
|
|
23
|
+
XML documents are parsed using either the [sax](https://www.npmjs.com/package/sax) or [node-expat](https://www.npmjs.com/package/node-expat) parser. parsers. Additional parsers may be added in future releases. By default, the `sax` parser is used. However, I strongly recommend using the `node-expat` parser, as other parsers I tested are not fully compliant with XML standards.
|
|
22
24
|
|
|
23
|
-
**NOTE: The `node-expat` module is not automatically installed with this module.
|
|
25
|
+
**NOTE: The `node-expat` module is not automatically installed with this module. If you wish to use it, you must install it manually.**
|
|
24
26
|
|
|
25
27
|
## Installation
|
|
26
28
|
|
|
27
|
-
|
|
29
|
+
To install the module, use the standard Node.js installation process. Optionally, you can also install the `node-expat` parser:
|
|
28
30
|
```bash
|
|
29
31
|
npm install xml-twig
|
|
30
32
|
|
|
@@ -32,7 +34,7 @@ npm install xml-twig
|
|
|
32
34
|
npm install node-expat
|
|
33
35
|
|
|
34
36
|
```
|
|
35
|
-
In my tests I parsed a 900 MB
|
|
37
|
+
In my tests, I parsed a 900 MB XML file, and the `node-expat`t parser was faster than `sax` (`node-expat`: around 2:30 minutes, `sax`: around 3:40 minutes). However, you may encounter issues when installing the `node-expat` parser, which is why it's not installed automatically.
|
|
36
38
|
|
|
37
39
|
## How to use it
|
|
38
40
|
|
|
@@ -72,10 +74,9 @@ API Documentation: see [Twig](./doc/twig.md)
|
|
|
72
74
|
|
|
73
75
|
- **Read XML Document in chucks**
|
|
74
76
|
|
|
75
|
-
|
|
76
|
-
The most notable difference to other parsers is the `purge()` and `purgeUpTo()` method. The parser reads the element and you decide how long you need to keep it in the memory.
|
|
77
|
-
In many cases you will purge it immediately after you have used it but in some cases you may keep the element for later use. The parser knows the element position in the XML-Tree.
|
|
77
|
+
The key feature of this module is the ability to read and process XML files in chunks. You need to define handler functions for the elements you want to process.
|
|
78
78
|
|
|
79
|
+
A major difference compared to other parsers is the `purge()` and `purgeUpTo()` methods. The parser reads an element, and you decide how long to keep it in memory. In many cases, you will purge the element immediately after processing it, but in some situations, you might want to retain it for later use. The parser keeps track of the element’s position within the XML tree.
|
|
79
80
|
|
|
80
81
|
```js
|
|
81
82
|
function bookHandler(elt, parserObj) {
|
|
@@ -146,10 +147,6 @@ API Documentation: see [Twig](./doc/twig.md)
|
|
|
146
147
|
|
|
147
148
|
```
|
|
148
149
|
|
|
149
|
-
Be aware if you run methods like `elt.followingSibling()`, `elt.descendant()`, `elt.next()`, etc. on the current element. Such calls return empty result, because following element are not yet read from the XML file. You must navigate to an earlier element, e.g.<br>
|
|
150
|
-
`elt.root().children()[0].followingSibling()`
|
|
151
|
-
|
|
152
|
-
|
|
153
150
|
- **Read only parts from XML Document**
|
|
154
151
|
|
|
155
152
|
If you like to read only certain elements, use option `partial: true`. The `root` element is always read.
|
|
@@ -294,6 +291,9 @@ Here are some examples the get attribute and values:
|
|
|
294
291
|
|
|
295
292
|
`.writer(indented|xw)` - **XMLWriter**: Returns a [XMLWriter](https://www.npmjs.com/package/xml-writer) object you can use to print the currently loaded XML tree.<br>Instead of providing an indented parameter (`true`, `false` or indent character) you can also provide an `XMLWriter` object which adds more flexibility.
|
|
296
293
|
|
|
294
|
+
Be aware if you call methods like `elt.followingSibling()`, `elt.descendant()`, `elt.next()`, etc. on the current element, they will return empty results. This is because the following elements have not yet been read from the XML file. To navigate to an earlier element, you can use a method like:<br>
|
|
295
|
+
`elt.root().children()[0].followingSibling()`
|
|
296
|
+
|
|
297
297
|
**condition** Parameter
|
|
298
298
|
|
|
299
299
|
You can specify condition on above methods. You can filter elements by following conditions:
|
package/doc/twig.md
CHANGED
|
@@ -127,6 +127,7 @@ You can specify a <code>function</code> or a <code>event</code> name</p>
|
|
|
127
127
|
* [.children](#Twig+children) : [<code>Array.<Twig></code>](#Twig) ℗
|
|
128
128
|
* [.parent](#Twig+parent) : [<code>Twig</code>](#Twig) \| <code>undefined</code> ℗
|
|
129
129
|
* [.pinned](#Twig+pinned) : <code>boolean</code> ℗
|
|
130
|
+
* [.trim](#Twig+trim) : <code>boolean</code> ℗
|
|
130
131
|
* [.purge](#Twig+purge)
|
|
131
132
|
* [.purgeUpTo](#Twig+purgeUpTo)
|
|
132
133
|
* [.escapeEntity](#Twig+escapeEntity)
|
|
@@ -156,6 +157,8 @@ You can specify a <code>function</code> or a <code>event</code> name</p>
|
|
|
156
157
|
* [.parent](#Twig+parent) ⇒ [<code>Twig</code>](#Twig)
|
|
157
158
|
* [.self](#Twig+self) ⇒ [<code>Twig</code>](#Twig)
|
|
158
159
|
* [.children](#Twig+children) ⇒ [<code>Array.<Twig></code>](#Twig)
|
|
160
|
+
* [.firstChild](#Twig+firstChild) ⇒ [<code>Twig</code>](#Twig)
|
|
161
|
+
* [.lastChild](#Twig+lastChild) ⇒ [<code>Twig</code>](#Twig)
|
|
159
162
|
* [.next](#Twig+next) ⇒ [<code>Twig</code>](#Twig)
|
|
160
163
|
* [.previous](#Twig+previous) ⇒ [<code>Twig</code>](#Twig)
|
|
161
164
|
* [.first](#Twig+first) ⇒ [<code>Twig</code>](#Twig)
|
|
@@ -238,6 +241,13 @@ The parent object. Undefined on root element
|
|
|
238
241
|
### twig.pinned : <code>boolean</code> ℗
|
|
239
242
|
Determines whether twig is needed in partial load
|
|
240
243
|
|
|
244
|
+
**Kind**: instance property of [<code>Twig</code>](#Twig)
|
|
245
|
+
**Access**: private
|
|
246
|
+
<a name="Twig+trim"></a>
|
|
247
|
+
|
|
248
|
+
### twig.trim : <code>boolean</code> ℗
|
|
249
|
+
Determines whether text is trimmed
|
|
250
|
+
|
|
241
251
|
**Kind**: instance property of [<code>Twig</code>](#Twig)
|
|
242
252
|
**Access**: private
|
|
243
253
|
<a name="Twig+purge"></a>
|
|
@@ -327,7 +337,7 @@ Returns the name of the element. Synonym for `twig.name`
|
|
|
327
337
|
<a name="Twig+text"></a>
|
|
328
338
|
|
|
329
339
|
### twig.text ⇒ <code>string</code>
|
|
330
|
-
The text of the element. No matter if given as text or CDATA entity
|
|
340
|
+
The text of the element. No matter if given as text or CDATA entity.
|
|
331
341
|
|
|
332
342
|
**Kind**: instance property of [<code>Twig</code>](#Twig)
|
|
333
343
|
**Returns**: <code>string</code> - Element text or empty string
|
|
@@ -492,6 +502,28 @@ All children, optionally matching `condition` of the current element or empty ar
|
|
|
492
502
|
| --- | --- | --- |
|
|
493
503
|
| condition | [<code>ElementCondition</code>](#ElementCondition) | Optional condition |
|
|
494
504
|
|
|
505
|
+
<a name="Twig+firstChild"></a>
|
|
506
|
+
|
|
507
|
+
### twig.firstChild ⇒ [<code>Twig</code>](#Twig)
|
|
508
|
+
The first matching child, optionally matching `condition` of the current element or null
|
|
509
|
+
|
|
510
|
+
**Kind**: instance property of [<code>Twig</code>](#Twig)
|
|
511
|
+
|
|
512
|
+
| Param | Type | Description |
|
|
513
|
+
| --- | --- | --- |
|
|
514
|
+
| condition | [<code>ElementCondition</code>](#ElementCondition) | Optional condition |
|
|
515
|
+
|
|
516
|
+
<a name="Twig+lastChild"></a>
|
|
517
|
+
|
|
518
|
+
### twig.lastChild ⇒ [<code>Twig</code>](#Twig)
|
|
519
|
+
The last matching child, optionally matching `condition` of the current element or null
|
|
520
|
+
|
|
521
|
+
**Kind**: instance property of [<code>Twig</code>](#Twig)
|
|
522
|
+
|
|
523
|
+
| Param | Type | Description |
|
|
524
|
+
| --- | --- | --- |
|
|
525
|
+
| condition | [<code>ElementCondition</code>](#ElementCondition) | Optional condition |
|
|
526
|
+
|
|
495
527
|
<a name="Twig+next"></a>
|
|
496
528
|
|
|
497
529
|
### twig.next ⇒ [<code>Twig</code>](#Twig)
|
|
@@ -761,6 +793,7 @@ Common function to filter Twig element
|
|
|
761
793
|
* [.children](#Twig+children) : [<code>Array.<Twig></code>](#Twig) ℗
|
|
762
794
|
* [.parent](#Twig+parent) : [<code>Twig</code>](#Twig) \| <code>undefined</code> ℗
|
|
763
795
|
* [.pinned](#Twig+pinned) : <code>boolean</code> ℗
|
|
796
|
+
* [.trim](#Twig+trim) : <code>boolean</code> ℗
|
|
764
797
|
* [.purge](#Twig+purge)
|
|
765
798
|
* [.purgeUpTo](#Twig+purgeUpTo)
|
|
766
799
|
* [.escapeEntity](#Twig+escapeEntity)
|
|
@@ -790,6 +823,8 @@ Common function to filter Twig element
|
|
|
790
823
|
* [.parent](#Twig+parent) ⇒ [<code>Twig</code>](#Twig)
|
|
791
824
|
* [.self](#Twig+self) ⇒ [<code>Twig</code>](#Twig)
|
|
792
825
|
* [.children](#Twig+children) ⇒ [<code>Array.<Twig></code>](#Twig)
|
|
826
|
+
* [.firstChild](#Twig+firstChild) ⇒ [<code>Twig</code>](#Twig)
|
|
827
|
+
* [.lastChild](#Twig+lastChild) ⇒ [<code>Twig</code>](#Twig)
|
|
793
828
|
* [.next](#Twig+next) ⇒ [<code>Twig</code>](#Twig)
|
|
794
829
|
* [.previous](#Twig+previous) ⇒ [<code>Twig</code>](#Twig)
|
|
795
830
|
* [.first](#Twig+first) ⇒ [<code>Twig</code>](#Twig)
|
|
@@ -872,6 +907,13 @@ The parent object. Undefined on root element
|
|
|
872
907
|
### twig.pinned : <code>boolean</code> ℗
|
|
873
908
|
Determines whether twig is needed in partial load
|
|
874
909
|
|
|
910
|
+
**Kind**: instance property of [<code>Twig</code>](#Twig)
|
|
911
|
+
**Access**: private
|
|
912
|
+
<a name="Twig+trim"></a>
|
|
913
|
+
|
|
914
|
+
### twig.trim : <code>boolean</code> ℗
|
|
915
|
+
Determines whether text is trimmed
|
|
916
|
+
|
|
875
917
|
**Kind**: instance property of [<code>Twig</code>](#Twig)
|
|
876
918
|
**Access**: private
|
|
877
919
|
<a name="Twig+purge"></a>
|
|
@@ -961,7 +1003,7 @@ Returns the name of the element. Synonym for `twig.name`
|
|
|
961
1003
|
<a name="Twig+text"></a>
|
|
962
1004
|
|
|
963
1005
|
### twig.text ⇒ <code>string</code>
|
|
964
|
-
The text of the element. No matter if given as text or CDATA entity
|
|
1006
|
+
The text of the element. No matter if given as text or CDATA entity.
|
|
965
1007
|
|
|
966
1008
|
**Kind**: instance property of [<code>Twig</code>](#Twig)
|
|
967
1009
|
**Returns**: <code>string</code> - Element text or empty string
|
|
@@ -1126,6 +1168,28 @@ All children, optionally matching `condition` of the current element or empty ar
|
|
|
1126
1168
|
| --- | --- | --- |
|
|
1127
1169
|
| condition | [<code>ElementCondition</code>](#ElementCondition) | Optional condition |
|
|
1128
1170
|
|
|
1171
|
+
<a name="Twig+firstChild"></a>
|
|
1172
|
+
|
|
1173
|
+
### twig.firstChild ⇒ [<code>Twig</code>](#Twig)
|
|
1174
|
+
The first matching child, optionally matching `condition` of the current element or null
|
|
1175
|
+
|
|
1176
|
+
**Kind**: instance property of [<code>Twig</code>](#Twig)
|
|
1177
|
+
|
|
1178
|
+
| Param | Type | Description |
|
|
1179
|
+
| --- | --- | --- |
|
|
1180
|
+
| condition | [<code>ElementCondition</code>](#ElementCondition) | Optional condition |
|
|
1181
|
+
|
|
1182
|
+
<a name="Twig+lastChild"></a>
|
|
1183
|
+
|
|
1184
|
+
### twig.lastChild ⇒ [<code>Twig</code>](#Twig)
|
|
1185
|
+
The last matching child, optionally matching `condition` of the current element or null
|
|
1186
|
+
|
|
1187
|
+
**Kind**: instance property of [<code>Twig</code>](#Twig)
|
|
1188
|
+
|
|
1189
|
+
| Param | Type | Description |
|
|
1190
|
+
| --- | --- | --- |
|
|
1191
|
+
| condition | [<code>ElementCondition</code>](#ElementCondition) | Optional condition |
|
|
1192
|
+
|
|
1129
1193
|
<a name="Twig+next"></a>
|
|
1130
1194
|
|
|
1131
1195
|
### twig.next ⇒ [<code>Twig</code>](#Twig)
|
|
@@ -1446,7 +1510,7 @@ Generic error for unsupported condition
|
|
|
1446
1510
|
|
|
1447
1511
|
## SAX
|
|
1448
1512
|
**Kind**: global constant
|
|
1449
|
-
**Version:**: 1.
|
|
1513
|
+
**Version:**: 1.9.1
|
|
1450
1514
|
**Author:**: Wernfried Domscheit
|
|
1451
1515
|
**Copyright:**: Copyright (c) 2025 Wernfried Domscheit. All rights reserved.
|
|
1452
1516
|
**Website:**: https://www.npmjs.com/package/xml-twig
|
package/package.json
CHANGED
|
@@ -5,7 +5,7 @@
|
|
|
5
5
|
},
|
|
6
6
|
"name": "xml-twig",
|
|
7
7
|
"description": "Node module for processing huge XML documents in tree mode",
|
|
8
|
-
"version": "1.
|
|
8
|
+
"version": "1.9.1",
|
|
9
9
|
"main": "twig.js",
|
|
10
10
|
"directories": {
|
|
11
11
|
"doc": "doc"
|
|
@@ -15,16 +15,15 @@
|
|
|
15
15
|
"doc/*.md"
|
|
16
16
|
],
|
|
17
17
|
"devDependencies": {
|
|
18
|
-
"jsdoc-to-markdown": "^9.
|
|
19
|
-
"luxon": "^3.
|
|
18
|
+
"jsdoc-to-markdown": "^9.1.1",
|
|
19
|
+
"luxon": "^3.6.1",
|
|
20
20
|
"node-expat": "^2.4.1"
|
|
21
21
|
},
|
|
22
22
|
"scripts": {
|
|
23
23
|
"test": "node demo.js",
|
|
24
|
-
"
|
|
25
|
-
"
|
|
26
|
-
"
|
|
27
|
-
"prepare": "git commit -a -m \"Updated doc and version\""
|
|
24
|
+
"postversion": "sed -bi -e \"s/@version: .*/@version: %npm_package_version%/\" twig.js",
|
|
25
|
+
"prepare": "jsdoc2md --EOL win32 --private twig.js > doc/twig.md",
|
|
26
|
+
"postpack": "git commit -a -m \"Updated doc and version\""
|
|
28
27
|
},
|
|
29
28
|
"repository": {
|
|
30
29
|
"type": "git",
|
package/twig.js
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
/**
|
|
2
|
-
* @version: 1.
|
|
2
|
+
* @version: 1.9.1
|
|
3
3
|
* @author: Wernfried Domscheit
|
|
4
4
|
* @copyright: Copyright (c) 2025 Wernfried Domscheit. All rights reserved.
|
|
5
5
|
* @website: https://www.npmjs.com/package/xml-twig
|
|
@@ -278,6 +278,12 @@ function createParser(handler, options = {}) {
|
|
|
278
278
|
enumerable: true
|
|
279
279
|
});
|
|
280
280
|
|
|
281
|
+
Object.defineProperty(parser, 'trimText', {
|
|
282
|
+
value: options.trim,
|
|
283
|
+
writable: false,
|
|
284
|
+
enumerable: true
|
|
285
|
+
});
|
|
286
|
+
|
|
281
287
|
if (options.file != null) {
|
|
282
288
|
Object.defineProperty(parser, 'file', {
|
|
283
289
|
value: options.file,
|
|
@@ -289,7 +295,7 @@ function createParser(handler, options = {}) {
|
|
|
289
295
|
// Common events
|
|
290
296
|
parser.on('text', function (str) {
|
|
291
297
|
if (parser.twig.current === null) return;
|
|
292
|
-
parser.twig.current.text =
|
|
298
|
+
parser.twig.current.text = str;
|
|
293
299
|
});
|
|
294
300
|
|
|
295
301
|
parser.on("comment", function (str) {
|
|
@@ -513,6 +519,12 @@ class Twig {
|
|
|
513
519
|
*/
|
|
514
520
|
#pinned = false;
|
|
515
521
|
|
|
522
|
+
/**
|
|
523
|
+
* Determines whether text is trimmed
|
|
524
|
+
* @type {boolean}
|
|
525
|
+
*/
|
|
526
|
+
#trim = true;
|
|
527
|
+
|
|
516
528
|
/**
|
|
517
529
|
* Create a new Twig object
|
|
518
530
|
* @param {Parser} parser - The main parser object
|
|
@@ -525,6 +537,7 @@ class Twig {
|
|
|
525
537
|
if (index === undefined)
|
|
526
538
|
parser.twig.current = this;
|
|
527
539
|
|
|
540
|
+
this.#trim = parser.trimText;
|
|
528
541
|
if (name === null) {
|
|
529
542
|
// Root element not available yet
|
|
530
543
|
parser.twig.tree = this;
|
|
@@ -698,11 +711,15 @@ class Twig {
|
|
|
698
711
|
}
|
|
699
712
|
|
|
700
713
|
/**
|
|
701
|
-
* The text of the element. No matter if given as text or CDATA entity
|
|
714
|
+
* The text of the element. No matter if given as text or CDATA entity.
|
|
715
|
+
* If option `trim: true`, then whitespace from both ends of the string are removed
|
|
702
716
|
* @returns {string} Element text or empty string
|
|
703
717
|
*/
|
|
704
718
|
get text() {
|
|
705
|
-
|
|
719
|
+
if (this.#text === null)
|
|
720
|
+
return ''
|
|
721
|
+
else
|
|
722
|
+
return this.#trim ? this.#text.trim() : this.#text;
|
|
706
723
|
}
|
|
707
724
|
|
|
708
725
|
/**
|
|
@@ -958,6 +975,26 @@ class Twig {
|
|
|
958
975
|
return this.filterElements(this.#children, condition);
|
|
959
976
|
};
|
|
960
977
|
|
|
978
|
+
/**
|
|
979
|
+
* The first matching child, optionally matching `condition` of the current element or null
|
|
980
|
+
* @param {ElementCondition} condition - Optional condition
|
|
981
|
+
* @returns {?Twig}
|
|
982
|
+
*/
|
|
983
|
+
firstChild = function (condition) {
|
|
984
|
+
let _children = this.children(condition);
|
|
985
|
+
return _children.length == 0 ? null : _children[0];
|
|
986
|
+
};
|
|
987
|
+
|
|
988
|
+
/**
|
|
989
|
+
* The last matching child, optionally matching `condition` of the current element or null
|
|
990
|
+
* @param {ElementCondition} condition - Optional condition
|
|
991
|
+
* @returns {?Twig}
|
|
992
|
+
*/
|
|
993
|
+
lastChild = function (condition) {
|
|
994
|
+
let _children = this.children(condition);
|
|
995
|
+
return _children.length == 0 ? null : _children[_children.length - 1];
|
|
996
|
+
};
|
|
997
|
+
|
|
961
998
|
/**
|
|
962
999
|
* Returns the next matching element.
|
|
963
1000
|
* @param {ElementCondition} condition - Optional condition
|