xml-twig 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +674 -0
- package/README.md +378 -0
- package/demo/demo.js +18 -0
- package/demo/memory-test.js +71 -0
- package/demo/speed-test.js +66 -0
- package/doc/build.sh +6 -0
- package/doc/twig.md +1449 -9
- package/package.json +37 -0
- package/samples/bookstore.xml +48 -0
- package/samples/breakfast-menu.xml +43 -0
- package/samples/processingInstruction.xml +29 -0
- package/samples/xmlns.xml +19 -0
- package/twig.js +1139 -0
package/README.md
ADDED
|
@@ -0,0 +1,378 @@
|
|
|
1
|
+
# xml-twig
|
|
2
|
+
Node module for processing huge XML documents in tree mode
|
|
3
|
+
|
|
4
|
+
Inspired by Perl module [XML::Twig](https://metacpan.org/pod/XML::Twig)
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
## When should I use this, motivation of this module
|
|
8
|
+
When you need to read a XML file, then you have two pinciples:
|
|
9
|
+
|
|
10
|
+
* The **Document Object Model (DOM)** style. These parser read the entire XML document into memory. Usually they provide easy methods to navigate in the document tree or make modifications.
|
|
11
|
+
|
|
12
|
+
DOM parsers are perfect for rather small files, for example configuration files or (X-)HTML pages. However, for bigger XML files you may run into memory limits.
|
|
13
|
+
|
|
14
|
+
* The **stream** or **event** based parsers. These parser read the XML file "line by line". The biggest advantage of such a parser is, there is no limit in the size of the XML file. You can read XML files having a size of many terabytes, because you read always just a single node.
|
|
15
|
+
|
|
16
|
+
The backside: By default you cannot navigate in the document tree, you know only the current node.
|
|
17
|
+
|
|
18
|
+
This module tries to combine both principles. The XML document can be read in chunks and within a chunk you have all the nice features and functions you know from a DOM based parser.
|
|
19
|
+
|
|
20
|
+
## Dependencies
|
|
21
|
+
XML documents are read either with [sax](https://www.npmjs.com/package/sax) or [node-expat](https://www.npmjs.com/package/node-expat) parser. More parser may be added in future releases. By default the `sax` parser is used.
|
|
22
|
+
|
|
23
|
+
**NOTE: The `sax` or `node-expat` module is not automatically installed with this module. Install desired parser by yourself**
|
|
24
|
+
|
|
25
|
+
## Installation
|
|
26
|
+
|
|
27
|
+
Install module like any other node module and the desired underlying parser:
|
|
28
|
+
```
|
|
29
|
+
npm install xml-twig
|
|
30
|
+
|
|
31
|
+
npm install sax
|
|
32
|
+
# and/or
|
|
33
|
+
npm install node-expat
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
In my tests I parsed a 750 MB big XML file, the `node-expat` is around times faster than `sax` (node-expat: 2:20 Minutes, sax: 4:34 Minutes ). However, you may run into problems when you try to install the `node-expat` parser. That's the reason why underlying parsers are not installed automatically.
|
|
37
|
+
|
|
38
|
+
|
|
39
|
+
## How to use it
|
|
40
|
+
|
|
41
|
+
#### Names and Definitions
|
|
42
|
+
|
|
43
|
+
In XML-Path, there are seven kinds of nodes: `element`, `attribute`, `text`, `namespace`, `processingInstruction`, `comment`, and `document`, see [Nodes at W3C](https://www.w3.org/TR/xpath-datamodel-31/#Node). XML documents are treated as trees of nodes.
|
|
44
|
+
|
|
45
|
+
The [Twig](./doc/twig.md#Twig) Class models a "some-kind" Element tree. I try to follow the [XML-Path](https://www.w3.org/TR/xpath/) conventions whenver possible to avoid confusion.
|
|
46
|
+
|
|
47
|
+
|
|
48
|
+
#### XML-Namespaces
|
|
49
|
+
|
|
50
|
+
When the XML-Files uses [Namespaces](https://www.w3schools.com/xml/xml_namespaces.asp) then you can address the elements as they appear in the file, for example `cd:data`.
|
|
51
|
+
With option `{ namespaces : true }` you will get access to the `.namespace` property.
|
|
52
|
+
|
|
53
|
+
### Read XML Document
|
|
54
|
+
|
|
55
|
+
- Read entire XML file at once
|
|
56
|
+
|
|
57
|
+
This module is designed to read huge XML-Files. Of course, it works also well for small files. First create the Twig parser. Then create a Stream and pipe it to the parser.
|
|
58
|
+
|
|
59
|
+
```
|
|
60
|
+
const fs = require('fs')
|
|
61
|
+
const twig = require('xml-twig')
|
|
62
|
+
|
|
63
|
+
function rootHandler(elt) {
|
|
64
|
+
console.log(`${elt.name} finished after ${elt.line} lines`);
|
|
65
|
+
}
|
|
66
|
+
|
|
67
|
+
const parser = twig.createParser(rootHandler)
|
|
68
|
+
fs.createReadStream(`${__dirname}/node_modules/xml-twig/samples/bookstore.xml`).pipe(parser)
|
|
69
|
+
// Output -> bookstore finished after 48 lines
|
|
70
|
+
|
|
71
|
+
// Or use a Parser object instead of a Stream - works only with 'expat'!
|
|
72
|
+
const expatParser = require('./twig.js').createParser(rootHandler, { method: 'expat' })
|
|
73
|
+
expatParser.write('<html><head><title>Hello World</title></head><body><p>Foobar</p></body></html>');
|
|
74
|
+
// Output -> xml finished after 1 lines
|
|
75
|
+
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
- Read XML Document in chucks
|
|
79
|
+
|
|
80
|
+
The key feature of this module is to read XML files and process it in chunks. You need to create handler function for elements you like to process. If you don't specify any `name` property, then handler is called on every element.
|
|
81
|
+
|
|
82
|
+
|
|
83
|
+
```
|
|
84
|
+
const fs = require('fs')
|
|
85
|
+
const twig = require('xml-twig')
|
|
86
|
+
|
|
87
|
+
function bookHandler(elt) {
|
|
88
|
+
console.log(`${elt.attr("category")} ${elt.name} at line ${elt.line}`)
|
|
89
|
+
elt.purge() // -> without `purge()` the entire XML document will be loaded into memory
|
|
90
|
+
}
|
|
91
|
+
|
|
92
|
+
// different styles: below `handle_book` are all equivalent (with sample file `bookstore.xml`)
|
|
93
|
+
handle_book = [
|
|
94
|
+
{ name: 'book', function: bookHandler },
|
|
95
|
+
{ name: 'ebook', function: bookHandler }
|
|
96
|
+
];
|
|
97
|
+
handle_book = [
|
|
98
|
+
{ name: /book$/, function: bookHandler }
|
|
99
|
+
];
|
|
100
|
+
handle_book = [{
|
|
101
|
+
name: function(name,elt) { return name.endsWith('book') },
|
|
102
|
+
function: bookHandler
|
|
103
|
+
}];
|
|
104
|
+
handle_book = [{
|
|
105
|
+
name: function(name,elt) { return ['book', 'ebook'].includes(name) },
|
|
106
|
+
function: bookHandler
|
|
107
|
+
}];
|
|
108
|
+
handle_book = [{
|
|
109
|
+
name: function(name,elt) { return ['book', 'ebook'].includes(elt.name) },
|
|
110
|
+
function: bookHandler
|
|
111
|
+
}];
|
|
112
|
+
|
|
113
|
+
const parser = twig.createParser(handle_book)
|
|
114
|
+
fs.createReadStream(`${__dirname}/node_modules/xml-twig/samples/bookstore.xml`).pipe(parser)
|
|
115
|
+
|
|
116
|
+
Output:
|
|
117
|
+
|
|
118
|
+
cooking book at line 8
|
|
119
|
+
children book at line 15
|
|
120
|
+
fantasy ebook at line 23
|
|
121
|
+
web book at line 34
|
|
122
|
+
biography ebook at line 42
|
|
123
|
+
web book at line 48
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
- Read every element from XML Document
|
|
127
|
+
|
|
128
|
+
Skip the `name` proeprty if you like to read every element one-by-one:
|
|
129
|
+
|
|
130
|
+
|
|
131
|
+
```
|
|
132
|
+
const fs = require('fs')
|
|
133
|
+
const twig = require('xml-twig')
|
|
134
|
+
|
|
135
|
+
function anyHandler(elt) {
|
|
136
|
+
console.log(`${' '.repeat(elt.level)}${elt.name} => "${elt.text ?? ''}" at line ${elt.line}`)
|
|
137
|
+
elt.purge() // -> without `purge()` the entire XML document will be loaded into memory
|
|
138
|
+
|
|
139
|
+
// Be aware if you run methods like `elt.followingSibling()`, `elt.descendant()`, `elt.next()`, etc. on the current element.
|
|
140
|
+
// Such calls return emtpy result, because following element are not yet read from the XML file.
|
|
141
|
+
// You must navigate to an earlier element, e.g.
|
|
142
|
+
`elt.root().children()[0].followingSibling()`
|
|
143
|
+
}
|
|
144
|
+
|
|
145
|
+
const handle_any = [ { function: anyHandler } ];
|
|
146
|
+
|
|
147
|
+
// or use regular expression which matches every element, if you prefer
|
|
148
|
+
const handle_any = [ { name: /./, function: anyHandler } ];
|
|
149
|
+
|
|
150
|
+
const parser = twig.createParser(handle_any)
|
|
151
|
+
fs.createReadStream(`${__dirname}/node_modules/xml-twig/samples/bookstore.xml`).pipe(parser)
|
|
152
|
+
|
|
153
|
+
Output:
|
|
154
|
+
|
|
155
|
+
title => "Everyday Italian" at line 4
|
|
156
|
+
author => "Giada De Laurentiis" at line 5
|
|
157
|
+
year => "2005" at line 6
|
|
158
|
+
price => "30.00" at line 7
|
|
159
|
+
book => "" at line 8
|
|
160
|
+
title => "Harry Potter" at line 10
|
|
161
|
+
author => "J K. Rowling" at line 11
|
|
162
|
+
year => "2005" at line 12
|
|
163
|
+
price => "29.99" at line 13
|
|
164
|
+
book => "" at line 14
|
|
165
|
+
... some more
|
|
166
|
+
bookstore => "" at line 48
|
|
167
|
+
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
|
|
171
|
+
- Read only parts from XML Document
|
|
172
|
+
|
|
173
|
+
If you like to read only certain elements, use option `partial: true`. The `root` element is always read.
|
|
174
|
+
|
|
175
|
+
|
|
176
|
+
This sample program reads the `root` element and `<ebook>` elements (include their children elements), and the brances to reach the element.
|
|
177
|
+
|
|
178
|
+
```
|
|
179
|
+
const handle_ebook = [{ name: 'ebook', function: ebookHandler }];
|
|
180
|
+
const parser = require('./twig.js').createParser(handle_ebook, { partial: true })
|
|
181
|
+
fs.createReadStream(`${__dirname}/samples/bookstore.xml`).pipe(parser);
|
|
182
|
+
|
|
183
|
+
function ebookHandler(elt) {
|
|
184
|
+
console.log( elt.root().writer(' ').toString() );
|
|
185
|
+
elt.purge();
|
|
186
|
+
}
|
|
187
|
+
|
|
188
|
+
Output:
|
|
189
|
+
|
|
190
|
+
<bookstore>
|
|
191
|
+
<ebook category="fantasy">
|
|
192
|
+
<title lang="en">Harry Potter</title>
|
|
193
|
+
<author>Joanne Kathleen Rowling</author>
|
|
194
|
+
<year>2001</year>
|
|
195
|
+
<price>12.99</price>
|
|
196
|
+
<format>Kindle</format>
|
|
197
|
+
<device>ePub</device>
|
|
198
|
+
</ebook>
|
|
199
|
+
</bookstore>
|
|
200
|
+
|
|
201
|
+
<bookstore>
|
|
202
|
+
<ebook category="biography">
|
|
203
|
+
<title lang="en">The Autobiography of Benjamin Franklin</title>
|
|
204
|
+
<author>Benjamin Franklin</author>
|
|
205
|
+
<year>1996</year>
|
|
206
|
+
<price>39.99</price>
|
|
207
|
+
<format>Kindle</format>
|
|
208
|
+
<device>ePub</device>
|
|
209
|
+
</ebook>
|
|
210
|
+
</bookstore>
|
|
211
|
+
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
For details about other options, see [ParserOptions](./doc/twig.md#ParserOptions)
|
|
215
|
+
|
|
216
|
+
|
|
217
|
+
### Access elements and attributes
|
|
218
|
+
|
|
219
|
+
#### Get XML Attributes
|
|
220
|
+
|
|
221
|
+
`.hasAttribute(name)`: Checks if the attribute exists and returns `true` or `false`
|
|
222
|
+
|
|
223
|
+
`.attr(cond)`: Returns the value of attribute. If more than one attribute matches, then it returns all attributes as object
|
|
224
|
+
|
|
225
|
+
`.attribute(cond)`: Get attributes as object or `null` if no matching attribute was found. If `cond` is `undefined`, then all attributes are returned.
|
|
226
|
+
|
|
227
|
+
Specify attribute name or regular expression or custom condition. For details see [AttributeCondition](./doc/twig.md#AttributeCondition).<br>
|
|
228
|
+
Let's assume an XML element like this: `<person firstName="Jean-Luc", lastName="Picard", age="59" />`
|
|
229
|
+
|
|
230
|
+
Here are some examples the get attribute and values:
|
|
231
|
+
```
|
|
232
|
+
.hasAttribute('foo') => false
|
|
233
|
+
.hasAttribute('age') => true
|
|
234
|
+
|
|
235
|
+
.attr('lastName') => Picard
|
|
236
|
+
.attr(/^first/) => Jean-Luc
|
|
237
|
+
.attr(/name/i) => { "firstName": "Jean-Luc", "lastName": "Picard" }
|
|
238
|
+
.attr(key => { return ['firstName', 'lastName'].includes(key) }) => { "firstName": "Jean-Luc", "lastName": "Picard" }
|
|
239
|
+
|
|
240
|
+
.attribute() => { "firstName": "Jean-Luc", "lastName": "Picard", "age":59 }
|
|
241
|
+
.attribute("FIRSTNAME") => null
|
|
242
|
+
.attribute("firstName") => { "firstName": "Jean-Luc" }
|
|
243
|
+
.attribute(/name/i) => { "firstName": "Jean-Luc", "lastName": "Picard" }
|
|
244
|
+
|
|
245
|
+
.attribute(key => { return ['firstName', 'lastName'].includes(key) })) => { "firstName": "Jean-Luc", "lastName": "Picard" }
|
|
246
|
+
.attribute(key => { return key.includes('Name') })) => { "firstName": "Jean-Luc", "lastName": "Picard" }
|
|
247
|
+
|
|
248
|
+
.attribute((key, val) => { return key === 'age' && val > 50 })) => { "age": 59 }
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
#### Twig Methods, acessing XML Elements
|
|
252
|
+
|
|
253
|
+
`.root()` - **Twig**: The topmost element of the tree
|
|
254
|
+
|
|
255
|
+
`.self()` - **Twig**: The current element
|
|
256
|
+
|
|
257
|
+
`.parent()` - **Twig**: The parent of the current element
|
|
258
|
+
|
|
259
|
+
`.children(condition)` - **Twig[]**: All matching children of the current element or empty array
|
|
260
|
+
|
|
261
|
+
`.next(condition)` - **Twig**: Returns the next elt (optionally matching condition) element. This is defined as the next element which opens after the current element opens. Which usually means the first child of the element. Counter-intuitive as it might look this allows you to loop through the whole document by starting from the `root`.
|
|
262
|
+
|
|
263
|
+
`.previous(condition)` - **Twig**: Return the previous elt (optionally matching condition) of the element. This is the first element which opens before the current one. It is usually either the last descendant of the previous sibling or simply the parent
|
|
264
|
+
|
|
265
|
+
`.first(condition)` - **Twig**: Returns the first elt (optionally matching condition) element. Usually the `root` element.
|
|
266
|
+
|
|
267
|
+
`.last(condition)` - **Twig**: Returns the last elt (optionally matching condition) element. Usually this is root element.
|
|
268
|
+
|
|
269
|
+
`.ancestor(condition)` - **Twig[]**: All ancestors (parent, grandparent, etc.) of the current element (optionally matching condition) or an empty array.
|
|
270
|
+
|
|
271
|
+
`.ancestorOrSelf(condition)` - **Twig[]**: All ancestors (parent, grandparent, etc.) of the current element and the current element itself (optionally matching condition) or an empty array.
|
|
272
|
+
|
|
273
|
+
`.descendant(condition)` - **Twig[]**: All descendants (children, grandchildren, etc.) of the current element (optionally matching condition) or an empty array.
|
|
274
|
+
|
|
275
|
+
`.descendantOrSelf(condition)` - **Twig[]**: All descendants (children, grandchildren, etc.) of the current element and the current element itself (optionally matching condition) or an empty array.
|
|
276
|
+
|
|
277
|
+
`.sibling(condition)` - **Twig[]**: All siblings (optionally matching condition) before and after the current element or an empty array.
|
|
278
|
+
|
|
279
|
+
`.siblingOrSelf(condition)` - **Twig[]**: All siblings (optionally matching condition) before and after the current element or an empty array.
|
|
280
|
+
|
|
281
|
+
`.followingSibling(condition)` - **Twig[]**: All siblings (optionally matching condition) after the current element or an empty array.
|
|
282
|
+
|
|
283
|
+
`.precedingSibling(condition)` - **Twig[]**: All siblings (optionally matching condition) before the current element or an empty array.
|
|
284
|
+
|
|
285
|
+
`.nextSibling(condition)` - **Twig**: Returns the next (optionally matching condition) sibling element.
|
|
286
|
+
|
|
287
|
+
`.prevSibling(condition)` - **Twig**: Returns the previous (optionally matching condition) sibling element.
|
|
288
|
+
|
|
289
|
+
`.find(condition)` - **Twig**: Find a specific element in current element and returns the first match. In principle `.descendant(condition)[0]`
|
|
290
|
+
|
|
291
|
+
`.purge()` - void: Removes the current element from treee. Usually this methond is called after the element has been processed and when not needed anymore.
|
|
292
|
+
|
|
293
|
+
`.purgeUpTo(elt)` - void: Purges up to the elt element. This allows you to keep part of the tree in memory when you purge.
|
|
294
|
+
|
|
295
|
+
`.writer(indented|xw)` - **XMLWriter**: Returns a [XMLWriter](https://www.npmjs.com/package/xml-writer) object you can use to print the currently loaded XML tree.<br>Instead of providing an indented parameter (`true`, `false` or indent character) you can also provide an `XMLWriter` object which adds more flexibility.
|
|
296
|
+
|
|
297
|
+
**condition** Parameter
|
|
298
|
+
|
|
299
|
+
You can specify condition on above methods. You can filter elements by following conditions:
|
|
300
|
+
|
|
301
|
+
- If `undefined`, then all elements are returned.
|
|
302
|
+
|
|
303
|
+
- If `string` then the element name must be equal to the string
|
|
304
|
+
|
|
305
|
+
Example: `"book"`
|
|
306
|
+
|
|
307
|
+
- If `RegExp` then the element name must match the Regular Expression
|
|
308
|
+
|
|
309
|
+
Example: `/book$/i`
|
|
310
|
+
|
|
311
|
+
- With `ElementConditionFilter` you can speficy any custom filter function.<br>
|
|
312
|
+
|
|
313
|
+
Example: `(name, elt) => { return name === 'book' && elt.children().length > 1 }`
|
|
314
|
+
|
|
315
|
+
- With a `Twig` object, you can specify the element direclty. Apart from `purgeUpTo(elt)`, it is rarely used, because when you know the element then there is no reason to find it again.
|
|
316
|
+
|
|
317
|
+
Example: `elt.children()[2]`
|
|
318
|
+
|
|
319
|
+
|
|
320
|
+
For details see [ElementCondition](./doc/twig.md#ElementCondition).
|
|
321
|
+
|
|
322
|
+
For methods which return a **Twig[]** array, a call like `elt.siblings("book")` is equal to `elt.sibling().filter( x => x.name === "book" )`
|
|
323
|
+
|
|
324
|
+
For methods which return a single **Twig** element (e.g. `elt.next("book")`) the method is executed in a loop till a `<book>` element is found.
|
|
325
|
+
|
|
326
|
+
|
|
327
|
+
#### Twig Properties
|
|
328
|
+
|
|
329
|
+
`.isEmpty` - **boolean**: `true` if emtpy. An empty element ha no text nor any child elements, however empty elements can have attributes.
|
|
330
|
+
|
|
331
|
+
`.level` - **integer**: The level of the element. Root element has 0, children have 1, grand-children 2 and so on
|
|
332
|
+
|
|
333
|
+
`.isRoot` - **boolean**: `true` for the root element
|
|
334
|
+
|
|
335
|
+
`.hasChildren` - **boolean**: `true` if the element has any child elements
|
|
336
|
+
|
|
337
|
+
`.isFirstChild` - **boolean**: `true` if the element is the first child in the parent
|
|
338
|
+
|
|
339
|
+
`.isLastChild` - **boolean**: `true` if the element is the last child in the parent
|
|
340
|
+
|
|
341
|
+
`.line` - **integer**: The line of the element (where the closing tag appears) in the XML-File. First line is 1
|
|
342
|
+
|
|
343
|
+
`.column` - **integer**: The column of the element (where the closing tag appears) in the XML-File. First column is 1
|
|
344
|
+
|
|
345
|
+
`.name` - **string**: Name of the element/tag
|
|
346
|
+
|
|
347
|
+
`.tag` - **string**: Synonym for `name`
|
|
348
|
+
|
|
349
|
+
`.text` - **string**: The text of an element, no matter if given as CDATA entitiy or plain character data node (PCDATA)
|
|
350
|
+
|
|
351
|
+
`.attributes` - **object**: All attributes of the object
|
|
352
|
+
|
|
353
|
+
`.comment` - **string|string[]**: Comments or array of comments inside the element
|
|
354
|
+
|
|
355
|
+
`.declaration` - **object**: The XML-Declaration object, exist only on `root`.
|
|
356
|
+
|
|
357
|
+
Example `{version: '1.0', encoding: 'UTF-8'}`.
|
|
358
|
+
|
|
359
|
+
`.PI` - **object**: Processing Instruction, exist only on `root`.
|
|
360
|
+
|
|
361
|
+
Example `{ target: 'xml-stylesheet', data: 'type="text/xsl" href="style.xsl"' }`.
|
|
362
|
+
|
|
363
|
+
`.namespace` - **object**: Namespace of the element or `null`. Only available if parsed with option `xmlns: true`.
|
|
364
|
+
|
|
365
|
+
Example `{ local: 'h', uri: 'http://www.w3.org/TR/html4/' }`
|
|
366
|
+
|
|
367
|
+
|
|
368
|
+
## Limitations
|
|
369
|
+
|
|
370
|
+
This `xml-twig` module focus on reading a XML files. In principle it would be possible to create a XML file from scratch with the [Twig](./doc/twig.md#Twig) class. However, I think there are better modules available. Create/update/delete methods are rather limited. Perhaps I will add it in later release.
|
|
371
|
+
|
|
372
|
+
Accessing Twig-Elements by [XML-Path](https://www.w3.org/TR/xpath/) language is not supported. One reason it, the `Twig` class models more am [Element](https://www.w3schools.com/xml/xml_elements.asp) rather than a [Node](https://www.w3schools.com/xml/dom_nodes.asp) which would be more generic.
|
|
373
|
+
|
|
374
|
+
|
|
375
|
+
|
|
376
|
+
|
|
377
|
+
|
|
378
|
+
|
package/demo/demo.js
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
const fs = require('fs');
|
|
2
|
+
const process = require('process');
|
|
3
|
+
|
|
4
|
+
const parser = require('xml-twig').createParser({ name: /book$/, function: bookHandler }, { method: 'sax' })
|
|
5
|
+
fs.createReadStream(`${__dirname}/../samples/bookstore.xml`).pipe(parser)
|
|
6
|
+
|
|
7
|
+
|
|
8
|
+
function bookHandler(elt) {
|
|
9
|
+
console.log(`${elt.attr("category")} ${elt.name} at line ${elt.line}`)
|
|
10
|
+
elt.purge()
|
|
11
|
+
}
|
|
12
|
+
|
|
13
|
+
|
|
14
|
+
function rootHandler(elt) {
|
|
15
|
+
console.log(`${elt.name} finished after ${elt.line} lines`);
|
|
16
|
+
}
|
|
17
|
+
|
|
18
|
+
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
const fs = require('fs');
|
|
2
|
+
const process = require('process');
|
|
3
|
+
|
|
4
|
+
let NE = 0;
|
|
5
|
+
console.log('Starting...')
|
|
6
|
+
let parser = require('xml-twig').createParser([{ name: 'subsession', function: anyHandler }], { method: 'expat' })
|
|
7
|
+
let reader = fs.createReadStream(`${__dirname}/../samples/20231019015552.1-MSRAN.xml`);
|
|
8
|
+
reader.pipe(parser);
|
|
9
|
+
|
|
10
|
+
function anyHandler(elt) {
|
|
11
|
+
NE++;
|
|
12
|
+
if (NE % 5 === 0) {
|
|
13
|
+
for (const [key, value] of Object.entries(process.memoryUsage())) {
|
|
14
|
+
console.log(` Memory usage by ${key}, ${Math.round((value / 1024 / 1024 + Number.EPSILON) * 100) / 100} MiB`)
|
|
15
|
+
}
|
|
16
|
+
}
|
|
17
|
+
//elt.purge();
|
|
18
|
+
}
|
|
19
|
+
|
|
20
|
+
reader.on('end', () => {
|
|
21
|
+
console.log(`All done`);
|
|
22
|
+
});
|
|
23
|
+
|
|
24
|
+
|
|
25
|
+
|
|
26
|
+
/*
|
|
27
|
+
**********************
|
|
28
|
+
* Results
|
|
29
|
+
**********************
|
|
30
|
+
|
|
31
|
+
NODE_OPTIONS=--max-old-space-size=4096
|
|
32
|
+
|
|
33
|
+
5 NE
|
|
34
|
+
Memory usage by rss, 1070.65 MiB
|
|
35
|
+
Memory usage by heapTotal, 1025.37 MiB
|
|
36
|
+
Memory usage by heapUsed, 989.72 MiB
|
|
37
|
+
Memory usage by external, 1.24 MiB
|
|
38
|
+
Memory usage by arrayBuffers, 0.83 MiB
|
|
39
|
+
10 NE
|
|
40
|
+
Memory usage by rss, 2816.7 MiB
|
|
41
|
+
Memory usage by heapTotal, 2760.62 MiB
|
|
42
|
+
Memory usage by heapUsed, 2690.95 MiB
|
|
43
|
+
Memory usage by external, 1.42 MiB
|
|
44
|
+
Memory usage by arrayBuffers, 1.02 MiB
|
|
45
|
+
|
|
46
|
+
<--- Last few GCs --->
|
|
47
|
+
|
|
48
|
+
[6508:00000223570D04D0] 20692 ms: Scavenge 4034.0 (4122.1) -> 4032.4 (4135.1) MB, 13.7 / 0.0 ms (average mu = 0.431, current mu = 0.348) allocation failure;
|
|
49
|
+
[6508:00000223570D04D0] 23749 ms: Mark-sweep 4046.7 (4135.1) -> 4044.6 (4150.4) MB, 3030.4 / 0.0 ms (average mu = 0.250, current mu = 0.055) allocation failure; scavenge might not succeed
|
|
50
|
+
|
|
51
|
+
|
|
52
|
+
<--- JS stacktrace --->
|
|
53
|
+
|
|
54
|
+
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
|
|
55
|
+
1: 00007FF7BCDB9E7F node_api_throw_syntax_error+175967
|
|
56
|
+
2: 00007FF7BCD40C06 SSL_get_quiet_shutdown+65750
|
|
57
|
+
3: 00007FF7BCD41FC2 SSL_get_quiet_shutdown+70802
|
|
58
|
+
4: 00007FF7BD7DA214 v8::Isolate::ReportExternalAllocationLimitReached+116
|
|
59
|
+
5: 00007FF7BD7C5572 v8::Isolate::Exit+674
|
|
60
|
+
6: 00007FF7BD6473CC v8::internal::EmbedderStackStateScope::ExplicitScopeForTesting+124
|
|
61
|
+
7: 00007FF7BD6445EB v8::internal::Heap::CollectGarbage+3963
|
|
62
|
+
8: 00007FF7BD65A823 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath+2099
|
|
63
|
+
9: 00007FF7BD65B0CD v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath+93
|
|
64
|
+
10: 00007FF7BD66A903 v8::internal::Factory::NewFillerObject+851
|
|
65
|
+
11: 00007FF7BD35BEB5 v8::internal::DateCache::Weekday+1349
|
|
66
|
+
12: 00007FF7BD8778B1 v8::internal::SetupIsolateDelegate::SetupHeap+558193
|
|
67
|
+
13: 00007FF73D9F10A1
|
|
68
|
+
|
|
69
|
+
|
|
70
|
+
*/
|
|
71
|
+
|
|
@@ -0,0 +1,66 @@
|
|
|
1
|
+
const { DateTime } = require('luxon');
|
|
2
|
+
const startTime = DateTime.now();
|
|
3
|
+
const fs = require('fs');
|
|
4
|
+
|
|
5
|
+
let NE = 0;
|
|
6
|
+
console.log('Starting...')
|
|
7
|
+
let parser = require('xml-twig').createParser([{ name: 'subsession', function: anyHandler }], { method: 'expat' })
|
|
8
|
+
let reader = fs.createReadStream(`${__dirname}/../samples/20231019015552.1-MSRAN.xml`);
|
|
9
|
+
reader.pipe(parser);
|
|
10
|
+
|
|
11
|
+
function anyHandler(elt) {
|
|
12
|
+
NE++;
|
|
13
|
+
if (NE % 25 === 0) {
|
|
14
|
+
let d = DateTime.now().diff(startTime);
|
|
15
|
+
console.log(`${NE} NE in ${d.toFormat('mm:ss.S')}`);
|
|
16
|
+
}
|
|
17
|
+
elt.purge();
|
|
18
|
+
}
|
|
19
|
+
|
|
20
|
+
reader.on('end', () => {
|
|
21
|
+
let d = DateTime.now().diff(startTime);
|
|
22
|
+
console.log(`All done in ${d.toFormat('mm:ss.S')}`);
|
|
23
|
+
});
|
|
24
|
+
|
|
25
|
+
|
|
26
|
+
|
|
27
|
+
/*
|
|
28
|
+
**********************
|
|
29
|
+
* Results
|
|
30
|
+
**********************
|
|
31
|
+
|
|
32
|
+
Node.js with expat:
|
|
33
|
+
25 NE in 00:16.321
|
|
34
|
+
50 NE in 00:27.867
|
|
35
|
+
75 NE in 00:52.757
|
|
36
|
+
100 NE in 01:11.297
|
|
37
|
+
125 NE in 01:29.996
|
|
38
|
+
150 NE in 01:46.358
|
|
39
|
+
175 NE in 02:02.486
|
|
40
|
+
200 NE in 02:21.25
|
|
41
|
+
All done in 02:21.31
|
|
42
|
+
|
|
43
|
+
Node.js with sax:
|
|
44
|
+
25 NE in 00:32.988
|
|
45
|
+
50 NE in 00:53.964
|
|
46
|
+
75 NE in 01:38.18
|
|
47
|
+
100 NE in 02:12.977
|
|
48
|
+
125 NE in 02:47.781
|
|
49
|
+
150 NE in 03:17.601
|
|
50
|
+
175 NE in 03:48.676
|
|
51
|
+
200 NE in 04:22.523
|
|
52
|
+
All done in 04:22.528
|
|
53
|
+
|
|
54
|
+
|
|
55
|
+
Good old Perl XML::Twig
|
|
56
|
+
25 NE in 1:14
|
|
57
|
+
50 NE in 2:03
|
|
58
|
+
75 NE in 3:43
|
|
59
|
+
100 NE in 5:02
|
|
60
|
+
125 NE in 6:12
|
|
61
|
+
150 NE in 7:16
|
|
62
|
+
175 NE in 8:17
|
|
63
|
+
200 NE in 9:24
|
|
64
|
+
All done in 9:24
|
|
65
|
+
|
|
66
|
+
*/
|