xml-twig 1.0.6 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +80 -83
- package/doc/twig.md +187 -144
- package/package.json +3 -2
- package/samples/sample.js +88 -0
- package/twig.js +164 -187
- package/demo/demo.js +0 -18
- /package/{demo → samples}/memory-test.js +0 -0
- /package/{demo → samples}/speed-test.js +0 -0
package/README.md
CHANGED
|
@@ -5,7 +5,7 @@ Inspired by Perl module [XML::Twig](https://metacpan.org/pod/XML::Twig)
|
|
|
5
5
|
|
|
6
6
|
|
|
7
7
|
## When should I use this, motivation of this module
|
|
8
|
-
When you need to read a XML file, then you have two
|
|
8
|
+
When you need to read a XML file, then you have two principles:
|
|
9
9
|
|
|
10
10
|
* The **Document Object Model (DOM)** style. These parser read the entire XML document into memory. Usually they provide easy methods to navigate in the document tree or make modifications.
|
|
11
11
|
|
|
@@ -20,20 +20,19 @@ This module tries to combine both principles. The XML document can be read in ch
|
|
|
20
20
|
## Dependencies
|
|
21
21
|
XML documents are read either with [sax](https://www.npmjs.com/package/sax) or [node-expat](https://www.npmjs.com/package/node-expat) parser. More parser may be added in future releases. By default the `sax` parser is used.
|
|
22
22
|
|
|
23
|
-
**NOTE: The `
|
|
23
|
+
**NOTE: The `node-expat` module is not automatically installed with this module. Install the parser by yourself, if you like to use it**
|
|
24
24
|
|
|
25
25
|
## Installation
|
|
26
26
|
|
|
27
|
-
Install module like any other node module and
|
|
27
|
+
Install module like any other node module and optionally `node-expat`:
|
|
28
28
|
```
|
|
29
29
|
npm install xml-twig
|
|
30
30
|
|
|
31
|
-
|
|
32
|
-
# and/or
|
|
31
|
+
# and optionally
|
|
33
32
|
npm install node-expat
|
|
34
33
|
|
|
35
34
|
```
|
|
36
|
-
In my tests I parsed a 750 MB big XML file, the `node-expat` is around two times faster than `sax` (node-expat: 2:20 Minutes, sax: 4:20 Minutes). However, you may run into problems when you try to install the `node-expat` parser. That's the reason why
|
|
35
|
+
In my tests I parsed a 750 MB big XML file, the `node-expat` is around two times faster than `sax` (node-expat: 2:20 Minutes, sax: 4:20 Minutes). However, you may run into problems when you try to install the `node-expat` parser. That's the reason why `node-expat` parsers is not installed automatically.
|
|
37
36
|
|
|
38
37
|
|
|
39
38
|
## How to use it
|
|
@@ -42,7 +41,7 @@ In my tests I parsed a 750 MB big XML file, the `node-expat` is around two times
|
|
|
42
41
|
|
|
43
42
|
In XML-Path, there are seven kinds of nodes: `element`, `attribute`, `text`, `namespace`, `processingInstruction`, `comment`, and `document`, see [Nodes at W3C](https://www.w3.org/TR/xpath-datamodel-31/#Node). XML documents are treated as trees of nodes.
|
|
44
43
|
|
|
45
|
-
The [Twig](./doc/twig.md#Twig) Class models a "some-kind" Element tree. I try to follow the [XML-Path](https://www.w3.org/TR/xpath/) conventions
|
|
44
|
+
The [Twig](./doc/twig.md#Twig) Class models a "some-kind" Element tree. I try to follow the [XML-Path](https://www.w3.org/TR/xpath/) conventions whenever possible to avoid confusion.
|
|
46
45
|
|
|
47
46
|
|
|
48
47
|
#### XML-Namespaces
|
|
@@ -61,57 +60,66 @@ With option `{ namespaces : true }` you will get access to the `.namespace` prop
|
|
|
61
60
|
const twig = require('xml-twig')
|
|
62
61
|
|
|
63
62
|
function rootHandler(elt) {
|
|
64
|
-
console.log(
|
|
63
|
+
console.log(`<${elt.name}> finished after ${parser.currentLine} lines`);
|
|
65
64
|
}
|
|
66
65
|
|
|
67
|
-
const parser = twig.createParser(rootHandler)
|
|
68
|
-
fs.createReadStream(`${__dirname}/
|
|
69
|
-
|
|
66
|
+
const parser = twig.createParser({ tag: twig.Root, function: rootHandler }, { method: 'sax' })
|
|
67
|
+
fs.createReadStream(`${__dirname}/bookstore.xml`).pipe(parser)
|
|
68
|
+
|
|
69
|
+
// Output -> <bookstore> finished after 48 lines
|
|
70
|
+
|
|
70
71
|
|
|
71
72
|
// Or use a Parser object instead of a Stream - works only with 'expat'!
|
|
72
|
-
const
|
|
73
|
-
|
|
73
|
+
const parser = twig.createParser({ tag: twig.Root, function: rootHandler }, { method: 'expat' })
|
|
74
|
+
parser.write('<html><head><title>Hello World</title></head><body><p>Foobar</p></body></html>');
|
|
75
|
+
|
|
74
76
|
// Output -> xml finished after 1 lines
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
If you prefer events, then use `event` property instead of `function` in handler declaration:
|
|
75
80
|
|
|
76
81
|
```
|
|
82
|
+
const parser = twig.createParser({ tag: twig.Root, event: 'rootElement' }, { method: 'sax' })
|
|
83
|
+
fs.createReadStream(`${__dirname}/bookstore.xml`).pipe(parser)
|
|
84
|
+
|
|
85
|
+
parser.on('rootElement', (elt) => {
|
|
86
|
+
console.log(`<${elt.name}> finished after ${parser.currentLine} lines`);
|
|
87
|
+
})
|
|
88
|
+
```
|
|
89
|
+
|
|
77
90
|
|
|
78
91
|
- Read XML Document in chucks
|
|
79
92
|
|
|
80
|
-
The key feature of this module is to read
|
|
93
|
+
The key feature of this module is to read and process XML files in chunks. You need to create handler functions for elements you like to process.
|
|
81
94
|
|
|
82
95
|
|
|
83
96
|
```
|
|
84
|
-
const fs = require('fs')
|
|
85
|
-
const twig = require('xml-twig')
|
|
86
|
-
|
|
87
97
|
function bookHandler(elt) {
|
|
88
|
-
console.log(`${elt.attr("category")} ${elt.name} at line ${
|
|
98
|
+
console.log(`${elt.attr("category")} ${elt.name} at line ${parser.currentLine}`)
|
|
89
99
|
elt.purge() // -> without `purge()` the entire XML document will be loaded into memory
|
|
90
100
|
}
|
|
91
101
|
|
|
92
102
|
// different styles: below `handle_book` are all equivalent (with sample file `bookstore.xml`)
|
|
93
103
|
handle_book = [
|
|
94
|
-
{
|
|
95
|
-
{
|
|
96
|
-
];
|
|
97
|
-
handle_book = [
|
|
98
|
-
{ name: /book$/, function: bookHandler }
|
|
104
|
+
{ tag: 'book', function: bookHandler },
|
|
105
|
+
{ tag: 'ebook', function: bookHandler }
|
|
99
106
|
];
|
|
107
|
+
handle_book = { tag: /book$/, function: bookHandler };
|
|
100
108
|
handle_book = [{
|
|
101
|
-
|
|
109
|
+
tag: function(name, elt) { return name.endsWith('book') },
|
|
102
110
|
function: bookHandler
|
|
103
111
|
}];
|
|
104
112
|
handle_book = [{
|
|
105
|
-
|
|
113
|
+
tag: function(name, elt) { return ['book', 'ebook'].includes(name) },
|
|
106
114
|
function: bookHandler
|
|
107
115
|
}];
|
|
108
116
|
handle_book = [{
|
|
109
|
-
|
|
117
|
+
tag: function(name, elt) { return ['book', 'ebook'].includes(elt.name) },
|
|
110
118
|
function: bookHandler
|
|
111
119
|
}];
|
|
112
120
|
|
|
113
|
-
const parser = twig.createParser(handle_book)
|
|
114
|
-
fs.createReadStream(`${__dirname}/
|
|
121
|
+
const parser = twig.createParser(handle_book, { method: 'sax' })
|
|
122
|
+
fs.createReadStream(`${__dirname}/bookstore.xml`).pipe(parser)
|
|
115
123
|
|
|
116
124
|
Output:
|
|
117
125
|
|
|
@@ -124,31 +132,17 @@ With option `{ namespaces : true }` you will get access to the `.namespace` prop
|
|
|
124
132
|
```
|
|
125
133
|
|
|
126
134
|
- Read every element from XML Document
|
|
127
|
-
|
|
128
|
-
Skip the `name` proeprty if you like to read every element one-by-one:
|
|
129
|
-
|
|
130
|
-
|
|
135
|
+
|
|
131
136
|
```
|
|
132
|
-
const fs = require('fs')
|
|
133
|
-
const twig = require('xml-twig')
|
|
134
|
-
|
|
135
137
|
function anyHandler(elt) {
|
|
136
|
-
console.log(`${' '.repeat(elt.level)}${elt.name} => "${elt.text ?? ''}" at line ${
|
|
138
|
+
console.log(`${' '.repeat(elt.level)}${elt.name} => "${elt.text ?? ''}" at line ${parser.currentLine}`)
|
|
137
139
|
elt.purge() // -> without `purge()` the entire XML document will be loaded into memory
|
|
138
|
-
|
|
139
|
-
// Be aware if you run methods like `elt.followingSibling()`, `elt.descendant()`, `elt.next()`, etc. on the current element.
|
|
140
|
-
// Such calls return emtpy result, because following element are not yet read from the XML file.
|
|
141
|
-
// You must navigate to an earlier element, e.g.
|
|
142
|
-
`elt.root().children()[0].followingSibling()`
|
|
143
140
|
}
|
|
144
141
|
|
|
145
|
-
const
|
|
146
|
-
|
|
147
|
-
// or
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
const parser = twig.createParser(handle_any)
|
|
151
|
-
fs.createReadStream(`${__dirname}/node_modules/xml-twig/samples/bookstore.xml`).pipe(parser)
|
|
142
|
+
const parser = twig.createParser({ tag: twig.Any, function: anyHandler })
|
|
143
|
+
// or with Regular Expression -> `{ tag: /i/, function: anyHandler }`
|
|
144
|
+
// or with Function -> `{ tag: () => {return true}, function: anyHandler }`
|
|
145
|
+
fs.createReadStream(`${__dirname}/bookstore.xml`).pipe(parser)
|
|
152
146
|
|
|
153
147
|
Output:
|
|
154
148
|
|
|
@@ -167,51 +161,58 @@ With option `{ namespaces : true }` you will get access to the `.namespace` prop
|
|
|
167
161
|
|
|
168
162
|
```
|
|
169
163
|
|
|
164
|
+
Be aware if you run methods like `elt.followingSibling()`, `elt.descendant()`, `elt.next()`, etc. on the current element. Such calls return empty result, because following element are not yet read from the XML file. You must navigate to an earlier element, e.g.<br>
|
|
165
|
+
`elt.root().children()[0].followingSibling()`
|
|
166
|
+
|
|
170
167
|
|
|
171
168
|
- Read only parts from XML Document
|
|
172
169
|
|
|
173
170
|
If you like to read only certain elements, use option `partial: true`. The `root` element is always read.
|
|
174
171
|
|
|
175
|
-
|
|
176
|
-
This sample program reads the `root` element and `<ebook>` elements (include their children elements), and the brances to reach the element.
|
|
172
|
+
This sample program reads the `root` element and `<ebook>` elements (include their children elements), and the branches to reach the element.
|
|
177
173
|
|
|
178
174
|
```
|
|
179
|
-
const handle_ebook = [
|
|
180
|
-
|
|
175
|
+
const handle_ebook = [
|
|
176
|
+
{ tag: 'ebook', function: ebookHandler },
|
|
177
|
+
{ tag: twig.Root, function: rootHandler }
|
|
178
|
+
];
|
|
179
|
+
const parser = twig.createParser(handle_ebook, { partial: true })
|
|
181
180
|
fs.createReadStream(`${__dirname}/samples/bookstore.xml`).pipe(parser);
|
|
182
181
|
|
|
183
182
|
function ebookHandler(elt) {
|
|
184
|
-
console.log(
|
|
185
|
-
|
|
183
|
+
console.log(`${elt.name} at line ${parser.currentLine}`)
|
|
184
|
+
}
|
|
185
|
+
|
|
186
|
+
function rootHandler(elt) {
|
|
187
|
+
console.log( elt.writer(' ').toString() );
|
|
186
188
|
}
|
|
187
189
|
|
|
190
|
+
|
|
188
191
|
Output:
|
|
189
192
|
|
|
193
|
+
ebook at line 23
|
|
194
|
+
ebook at line 41
|
|
190
195
|
<bookstore>
|
|
191
196
|
<ebook category="fantasy">
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
197
|
+
<title lang="en">Harry Potter</title>
|
|
198
|
+
<author>Joanne Kathleen Rowling</author>
|
|
199
|
+
<year>2001</year>
|
|
200
|
+
<price>12.99</price>
|
|
201
|
+
<format>Kindle</format>
|
|
202
|
+
<device>ePub</device>
|
|
198
203
|
</ebook>
|
|
199
|
-
</bookstore>
|
|
200
|
-
|
|
201
|
-
<bookstore>
|
|
202
204
|
<ebook category="biography">
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
205
|
+
<title lang="en">The Autobiography of Benjamin Franklin</title>
|
|
206
|
+
<author>Benjamin Franklin</author>
|
|
207
|
+
<year>1996</year>
|
|
208
|
+
<price>39.99</price>
|
|
209
|
+
<format>Kindle</format>
|
|
210
|
+
<device>ePub</device>
|
|
209
211
|
</ebook>
|
|
210
212
|
</bookstore>
|
|
211
|
-
|
|
212
213
|
```
|
|
213
214
|
|
|
214
|
-
For details
|
|
215
|
+
For details and other options, see [ParserOptions](./doc/twig.md#ParserOptions) and [TwigHandler](./doc/twig.md#TwigHandler)
|
|
215
216
|
|
|
216
217
|
|
|
217
218
|
### Access elements and attributes
|
|
@@ -220,9 +221,9 @@ For details about other options, see [ParserOptions](./doc/twig.md#ParserOptions
|
|
|
220
221
|
|
|
221
222
|
`.hasAttribute(name)`: Checks if the attribute exists and returns `true` or `false`
|
|
222
223
|
|
|
223
|
-
`.attr(
|
|
224
|
+
`.attr(condition)`: Returns the value of attribute. If more than one attribute matches, then it returns all attributes as object
|
|
224
225
|
|
|
225
|
-
`.attribute(
|
|
226
|
+
`.attribute(condition)`: Get attributes as object or `null` if no matching attribute was found. If `condition` is `undefined`, then all attributes are returned.
|
|
226
227
|
|
|
227
228
|
Specify attribute name or regular expression or custom condition. For details see [AttributeCondition](./doc/twig.md#AttributeCondition).<br>
|
|
228
229
|
Let's assume an XML element like this: `<person firstName="Jean-Luc", lastName="Picard", age="59" />`
|
|
@@ -288,7 +289,7 @@ Here are some examples the get attribute and values:
|
|
|
288
289
|
|
|
289
290
|
`.find(condition)` - **Twig**: Find a specific element in current element and returns the first match. In principle `.descendant(condition)[0]`
|
|
290
291
|
|
|
291
|
-
`.purge()` - void: Removes the current element from tree. Usually this
|
|
292
|
+
`.purge()` - void: Removes the current element from tree. Usually this method is called after the element has been processed and when not needed anymore.
|
|
292
293
|
|
|
293
294
|
`.purgeUpTo(elt)` - void: Purges up to the elt element. This allows you to keep part of the tree in memory when you purge.
|
|
294
295
|
|
|
@@ -308,11 +309,11 @@ You can specify condition on above methods. You can filter elements by following
|
|
|
308
309
|
|
|
309
310
|
Example: `/book$/i`
|
|
310
311
|
|
|
311
|
-
- With `ElementConditionFilter` you can
|
|
312
|
+
- With `ElementConditionFilter` you can specify any custom filter function.<br>
|
|
312
313
|
|
|
313
314
|
Example: `(name, elt) => { return name === 'book' && elt.children().length > 1 }`
|
|
314
315
|
|
|
315
|
-
- With a `Twig` object, you can specify the element
|
|
316
|
+
- With a `Twig` object, you can specify the element directly. Apart from `purgeUpTo(elt)`, it is rarely used, because when you know the element then there is no reason to find it again.
|
|
316
317
|
|
|
317
318
|
Example: `elt.children()[2]`
|
|
318
319
|
|
|
@@ -326,7 +327,7 @@ For methods which return a single **Twig** element (e.g. `elt.next("book")`) the
|
|
|
326
327
|
|
|
327
328
|
#### Twig Properties
|
|
328
329
|
|
|
329
|
-
`.isEmpty` - **boolean**: `true` if
|
|
330
|
+
`.isEmpty` - **boolean**: `true` if empty. An empty element ha no text nor any child elements, however empty elements can have attributes.
|
|
330
331
|
|
|
331
332
|
`.level` - **integer**: The level of the element. Root element has 0, children have 1, grand-children 2 and so on
|
|
332
333
|
|
|
@@ -338,15 +339,11 @@ For methods which return a single **Twig** element (e.g. `elt.next("book")`) the
|
|
|
338
339
|
|
|
339
340
|
`.isLastChild` - **boolean**: `true` if the element is the last child in the parent
|
|
340
341
|
|
|
341
|
-
`.line` - **integer**: The line of the element (where the closing tag appears) in the XML-File. First line is 1
|
|
342
|
-
|
|
343
|
-
`.column` - **integer**: The column of the element (where the closing tag appears) in the XML-File. First column is 1
|
|
344
|
-
|
|
345
342
|
`.name` - **string**: Name of the element/tag
|
|
346
343
|
|
|
347
344
|
`.tag` - **string**: Synonym for `name`
|
|
348
345
|
|
|
349
|
-
`.text` - **string**: The text of an element, no matter if given as CDATA
|
|
346
|
+
`.text` - **string**: The text of an element, no matter if given as CDATA entity or plain character data node (PCDATA)
|
|
350
347
|
|
|
351
348
|
`.attributes` - **object**: All attributes of the object
|
|
352
349
|
|
|
@@ -367,7 +364,7 @@ For methods which return a single **Twig** element (e.g. `elt.next("book")`) the
|
|
|
367
364
|
|
|
368
365
|
## Limitations
|
|
369
366
|
|
|
370
|
-
This `xml-twig` module focus on reading a XML files. In principle it would be possible to create a XML file from scratch with the [Twig](./doc/twig.md#Twig) class. However, I think there are better modules available.
|
|
367
|
+
This `xml-twig` module focus on reading a XML files. In principle it would be possible to create a XML file from scratch with the [Twig](./doc/twig.md#Twig) class. However, I think there are better modules available. Of course, you may run operations like `elt.root().children().push(elt.root().children()[0])`, but I think this is not so handy to use.
|
|
371
368
|
|
|
372
369
|
Accessing Twig-Elements by [XML-Path](https://www.w3.org/TR/xpath/) language is not supported. One reason it, the `Twig` class models more an [Element](https://www.w3schools.com/xml/xml_elements.asp) rather than a [Node](https://www.w3schools.com/xml/dom_nodes.asp) which would be more generic.
|
|
373
370
|
|