xml-twig 1.0.6 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -5,7 +5,7 @@ Inspired by Perl module [XML::Twig](https://metacpan.org/pod/XML::Twig)
5
5
 
6
6
 
7
7
  ## When should I use this, motivation of this module
8
- When you need to read a XML file, then you have two pinciples:
8
+ When you need to read a XML file, then you have two principles:
9
9
 
10
10
  * The **Document Object Model (DOM)** style. These parser read the entire XML document into memory. Usually they provide easy methods to navigate in the document tree or make modifications.
11
11
 
@@ -20,20 +20,19 @@ This module tries to combine both principles. The XML document can be read in ch
20
20
  ## Dependencies
21
21
  XML documents are read either with [sax](https://www.npmjs.com/package/sax) or [node-expat](https://www.npmjs.com/package/node-expat) parser. More parser may be added in future releases. By default the `sax` parser is used.
22
22
 
23
- **NOTE: The `sax` or `node-expat` module is not automatically installed with this module. Install desired parser by yourself**
23
+ **NOTE: The `node-expat` module is not automatically installed with this module. Install the parser by yourself, if you like to use it**
24
24
 
25
25
  ## Installation
26
26
 
27
- Install module like any other node module and the desired underlying parser:
27
+ Install module like any other node module and optionally `node-expat`:
28
28
  ```
29
29
  npm install xml-twig
30
30
 
31
- npm install sax
32
- # and/or
31
+ # and optionally
33
32
  npm install node-expat
34
33
 
35
34
  ```
36
- In my tests I parsed a 750 MB big XML file, the `node-expat` is around two times faster than `sax` (node-expat: 2:20 Minutes, sax: 4:20 Minutes). However, you may run into problems when you try to install the `node-expat` parser. That's the reason why underlying parsers are not installed automatically.
35
+ In my tests I parsed a 750 MB big XML file, the `node-expat` is around two times faster than `sax` (node-expat: 2:20 Minutes, sax: 4:20 Minutes). However, you may run into problems when you try to install the `node-expat` parser. That's the reason why `node-expat` parsers is not installed automatically.
37
36
 
38
37
 
39
38
  ## How to use it
@@ -42,7 +41,7 @@ In my tests I parsed a 750 MB big XML file, the `node-expat` is around two times
42
41
 
43
42
  In XML-Path, there are seven kinds of nodes: `element`, `attribute`, `text`, `namespace`, `processingInstruction`, `comment`, and `document`, see [Nodes at W3C](https://www.w3.org/TR/xpath-datamodel-31/#Node). XML documents are treated as trees of nodes.
44
43
 
45
- The [Twig](./doc/twig.md#Twig) Class models a "some-kind" Element tree. I try to follow the [XML-Path](https://www.w3.org/TR/xpath/) conventions whenver possible to avoid confusion.
44
+ The [Twig](./doc/twig.md#Twig) Class models a "some-kind" Element tree. I try to follow the [XML-Path](https://www.w3.org/TR/xpath/) conventions whenever possible to avoid confusion.
46
45
 
47
46
 
48
47
  #### XML-Namespaces
@@ -52,7 +51,7 @@ With option `{ namespaces : true }` you will get access to the `.namespace` prop
52
51
 
53
52
  ### Read XML Document
54
53
 
55
- - Read entire XML file at once
54
+ - **Read entire XML file at once**
56
55
 
57
56
  This module is designed to read huge XML-Files. Of course, it works also well for small files. First create the Twig parser. Then create a Stream and pipe it to the parser.
58
57
 
@@ -61,57 +60,66 @@ With option `{ namespaces : true }` you will get access to the `.namespace` prop
61
60
  const twig = require('xml-twig')
62
61
 
63
62
  function rootHandler(elt) {
64
- console.log(`${elt.name} finished after ${elt.line} lines`);
63
+ console.log(`<${elt.name}> finished after ${parser.currentLine} lines`);
65
64
  }
66
65
 
67
- const parser = twig.createParser(rootHandler)
68
- fs.createReadStream(`${__dirname}/node_modules/xml-twig/samples/bookstore.xml`).pipe(parser)
69
- // Output -> bookstore finished after 48 lines
66
+ const parser = twig.createParser({ tag: twig.Root, function: rootHandler }, { method: 'sax' })
67
+ fs.createReadStream(`${__dirname}/bookstore.xml`).pipe(parser)
68
+
69
+ // Output -> <bookstore> finished after 48 lines
70
+
70
71
 
71
72
  // Or use a Parser object instead of a Stream - works only with 'expat'!
72
- const expatParser = require('./twig.js').createParser(rootHandler, { method: 'expat' })
73
- expatParser.write('<html><head><title>Hello World</title></head><body><p>Foobar</p></body></html>');
73
+ const parser = twig.createParser({ tag: twig.Root, function: rootHandler }, { method: 'expat' })
74
+ parser.write('<html><head><title>Hello World</title></head><body><p>Foobar</p></body></html>');
75
+
74
76
  // Output -> xml finished after 1 lines
77
+ ```
78
+
79
+ If you prefer [events](https://nodejs.org/api/events.html), then use `event` property instead of `function` in handler declaration:
75
80
 
76
81
  ```
82
+ const parser = twig.createParser({ tag: twig.Root, event: 'rootElement' }, { method: 'sax' })
83
+ fs.createReadStream(`${__dirname}/bookstore.xml`).pipe(parser)
77
84
 
78
- - Read XML Document in chucks
85
+ parser.on('rootElement', (elt) => {
86
+ console.log(`<${elt.name}> finished after ${parser.currentLine} lines`);
87
+ })
88
+ ```
89
+
90
+
91
+ - **Read XML Document in chucks**
79
92
 
80
- The key feature of this module is to read XML files and process it in chunks. You need to create handler function for elements you like to process. If you don't specify any `name` property, then handler is called on every element.
93
+ The key feature of this module is to read and process XML files in chunks. You need to create handler functions for elements you like to process.
81
94
 
82
95
 
83
96
  ```
84
- const fs = require('fs')
85
- const twig = require('xml-twig')
86
-
87
97
  function bookHandler(elt) {
88
- console.log(`${elt.attr("category")} ${elt.name} at line ${elt.line}`)
98
+ console.log(`${elt.attr("category")} ${elt.name} at line ${parser.currentLine}`)
89
99
  elt.purge() // -> without `purge()` the entire XML document will be loaded into memory
90
100
  }
91
101
 
92
102
  // different styles: below `handle_book` are all equivalent (with sample file `bookstore.xml`)
93
103
  handle_book = [
94
- { name: 'book', function: bookHandler },
95
- { name: 'ebook', function: bookHandler }
96
- ];
97
- handle_book = [
98
- { name: /book$/, function: bookHandler }
104
+ { tag: 'book', function: bookHandler },
105
+ { tag: 'ebook', function: bookHandler }
99
106
  ];
107
+ handle_book = { tag: /book$/, function: bookHandler };
100
108
  handle_book = [{
101
- name: function(name,elt) { return name.endsWith('book') },
109
+ tag: function(name, elt) { return name.endsWith('book') },
102
110
  function: bookHandler
103
111
  }];
104
112
  handle_book = [{
105
- name: function(name,elt) { return ['book', 'ebook'].includes(name) },
113
+ tag: function(name, elt) { return ['book', 'ebook'].includes(name) },
106
114
  function: bookHandler
107
115
  }];
108
116
  handle_book = [{
109
- name: function(name,elt) { return ['book', 'ebook'].includes(elt.name) },
117
+ tag: function(name, elt) { return ['book', 'ebook'].includes(elt.name) },
110
118
  function: bookHandler
111
119
  }];
112
120
 
113
- const parser = twig.createParser(handle_book)
114
- fs.createReadStream(`${__dirname}/node_modules/xml-twig/samples/bookstore.xml`).pipe(parser)
121
+ const parser = twig.createParser(handle_book, { method: 'sax' })
122
+ fs.createReadStream(`${__dirname}/bookstore.xml`).pipe(parser)
115
123
 
116
124
  Output:
117
125
 
@@ -123,32 +131,18 @@ With option `{ namespaces : true }` you will get access to the `.namespace` prop
123
131
  web book at line 48
124
132
  ```
125
133
 
126
- - Read every element from XML Document
127
-
128
- Skip the `name` proeprty if you like to read every element one-by-one:
129
-
130
-
134
+ - **Read every element from XML Document**
135
+
131
136
  ```
132
- const fs = require('fs')
133
- const twig = require('xml-twig')
134
-
135
137
  function anyHandler(elt) {
136
- console.log(`${' '.repeat(elt.level)}${elt.name} => "${elt.text ?? ''}" at line ${elt.line}`)
138
+ console.log(`${' '.repeat(elt.level)}${elt.name} => "${elt.text ?? ''}" at line ${parser.currentLine}`)
137
139
  elt.purge() // -> without `purge()` the entire XML document will be loaded into memory
138
-
139
- // Be aware if you run methods like `elt.followingSibling()`, `elt.descendant()`, `elt.next()`, etc. on the current element.
140
- // Such calls return emtpy result, because following element are not yet read from the XML file.
141
- // You must navigate to an earlier element, e.g.
142
- `elt.root().children()[0].followingSibling()`
143
140
  }
144
141
 
145
- const handle_any = [ { function: anyHandler } ];
146
-
147
- // or use regular expression which matches every element, if you prefer
148
- const handle_any = [ { name: /./, function: anyHandler } ];
149
-
150
- const parser = twig.createParser(handle_any)
151
- fs.createReadStream(`${__dirname}/node_modules/xml-twig/samples/bookstore.xml`).pipe(parser)
142
+ const parser = twig.createParser({ tag: twig.Any, function: anyHandler })
143
+ // or with Regular Expression -> `{ tag: /i/, function: anyHandler }`
144
+ // or with Function -> `{ tag: () => {return true}, function: anyHandler }`
145
+ fs.createReadStream(`${__dirname}/bookstore.xml`).pipe(parser)
152
146
 
153
147
  Output:
154
148
 
@@ -167,51 +161,58 @@ With option `{ namespaces : true }` you will get access to the `.namespace` prop
167
161
 
168
162
  ```
169
163
 
164
+ Be aware if you run methods like `elt.followingSibling()`, `elt.descendant()`, `elt.next()`, etc. on the current element. Such calls return empty result, because following element are not yet read from the XML file. You must navigate to an earlier element, e.g.<br>
165
+ `elt.root().children()[0].followingSibling()`
170
166
 
171
- - Read only parts from XML Document
172
167
 
173
- If you like to read only certain elements, use option `partial: true`. The `root` element is always read.
168
+ - **Read only parts from XML Document**
174
169
 
170
+ If you like to read only certain elements, use option `partial: true`. The `root` element is always read.
175
171
 
176
- This sample program reads the `root` element and `<ebook>` elements (include their children elements), and the brances to reach the element.
172
+ This sample program reads the `root` element and `<ebook>` elements (include their children elements), and the branches to reach the element.
177
173
 
178
174
  ```
179
- const handle_ebook = [{ name: 'ebook', function: ebookHandler }];
180
- const parser = require('./twig.js').createParser(handle_ebook, { partial: true })
181
- fs.createReadStream(`${__dirname}/samples/bookstore.xml`).pipe(parser);
175
+ const handle_ebook = [
176
+ { tag: 'ebook', function: ebookHandler },
177
+ { tag: twig.Root, function: rootHandler }
178
+ ];
179
+ const parser = twig.createParser(handle_ebook, { partial: true })
180
+ fs.createReadStream(`${__dirname}/bookstore.xml`).pipe(parser);
182
181
 
183
182
  function ebookHandler(elt) {
184
- console.log( elt.root().writer(' ').toString() );
185
- elt.purge();
183
+ console.log(`${elt.name} at line ${parser.currentLine}`)
184
+ }
185
+
186
+ function rootHandler(elt) {
187
+ console.log( elt.writer(' ').toString() );
186
188
  }
187
189
 
190
+
188
191
  Output:
189
192
 
193
+ ebook at line 23
194
+ ebook at line 41
190
195
  <bookstore>
191
196
  <ebook category="fantasy">
192
- <title lang="en">Harry Potter</title>
193
- <author>Joanne Kathleen Rowling</author>
194
- <year>2001</year>
195
- <price>12.99</price>
196
- <format>Kindle</format>
197
- <device>ePub</device>
197
+ <title lang="en">Harry Potter</title>
198
+ <author>Joanne Kathleen Rowling</author>
199
+ <year>2001</year>
200
+ <price>12.99</price>
201
+ <format>Kindle</format>
202
+ <device>ePub</device>
198
203
  </ebook>
199
- </bookstore>
200
-
201
- <bookstore>
202
204
  <ebook category="biography">
203
- <title lang="en">The Autobiography of Benjamin Franklin</title>
204
- <author>Benjamin Franklin</author>
205
- <year>1996</year>
206
- <price>39.99</price>
207
- <format>Kindle</format>
208
- <device>ePub</device>
205
+ <title lang="en">The Autobiography of Benjamin Franklin</title>
206
+ <author>Benjamin Franklin</author>
207
+ <year>1996</year>
208
+ <price>39.99</price>
209
+ <format>Kindle</format>
210
+ <device>ePub</device>
209
211
  </ebook>
210
212
  </bookstore>
211
-
212
213
  ```
213
214
 
214
- For details about other options, see [ParserOptions](./doc/twig.md#ParserOptions)
215
+ For details and other options, see [ParserOptions](./doc/twig.md#ParserOptions) and [TwigHandler](./doc/twig.md#TwigHandler)
215
216
 
216
217
 
217
218
  ### Access elements and attributes
@@ -220,9 +221,9 @@ For details about other options, see [ParserOptions](./doc/twig.md#ParserOptions
220
221
 
221
222
  `.hasAttribute(name)`: Checks if the attribute exists and returns `true` or `false`
222
223
 
223
- `.attr(cond)`: Returns the value of attribute. If more than one attribute matches, then it returns all attributes as object
224
+ `.attr(condition)`: Returns the value of attribute. If more than one attribute matches, then it returns all attributes as object
224
225
 
225
- `.attribute(cond)`: Get attributes as object or `null` if no matching attribute was found. If `cond` is `undefined`, then all attributes are returned.
226
+ `.attribute(condition)`: Get attributes as object or `null` if no matching attribute was found. If `condition` is `undefined`, then all attributes are returned.
226
227
 
227
228
  Specify attribute name or regular expression or custom condition. For details see [AttributeCondition](./doc/twig.md#AttributeCondition).<br>
228
229
  Let's assume an XML element like this: `<person firstName="Jean-Luc", lastName="Picard", age="59" />`
@@ -288,7 +289,7 @@ Here are some examples the get attribute and values:
288
289
 
289
290
  `.find(condition)` - **Twig**: Find a specific element in current element and returns the first match. In principle `.descendant(condition)[0]`
290
291
 
291
- `.purge()` - void: Removes the current element from tree. Usually this methond is called after the element has been processed and when not needed anymore.
292
+ `.purge()` - void: Removes the current element from tree. Usually this method is called after the element has been processed and when not needed anymore.
292
293
 
293
294
  `.purgeUpTo(elt)` - void: Purges up to the elt element. This allows you to keep part of the tree in memory when you purge.
294
295
 
@@ -308,11 +309,11 @@ You can specify condition on above methods. You can filter elements by following
308
309
 
309
310
  Example: `/book$/i`
310
311
 
311
- - With `ElementConditionFilter` you can speficy any custom filter function.<br>
312
+ - With `ElementConditionFilter` you can specify any custom filter function.<br>
312
313
 
313
314
  Example: `(name, elt) => { return name === 'book' && elt.children().length > 1 }`
314
315
 
315
- - With a `Twig` object, you can specify the element direclty. Apart from `purgeUpTo(elt)`, it is rarely used, because when you know the element then there is no reason to find it again.
316
+ - With a `Twig` object, you can specify the element directly. Apart from `purgeUpTo(elt)`, it is rarely used, because when you know the element then there is no reason to find it again.
316
317
 
317
318
  Example: `elt.children()[2]`
318
319
 
@@ -326,7 +327,7 @@ For methods which return a single **Twig** element (e.g. `elt.next("book")`) the
326
327
 
327
328
  #### Twig Properties
328
329
 
329
- `.isEmpty` - **boolean**: `true` if emtpy. An empty element ha no text nor any child elements, however empty elements can have attributes.
330
+ `.isEmpty` - **boolean**: `true` if empty. An empty element ha no text nor any child elements, however empty elements can have attributes.
330
331
 
331
332
  `.level` - **integer**: The level of the element. Root element has 0, children have 1, grand-children 2 and so on
332
333
 
@@ -338,15 +339,11 @@ For methods which return a single **Twig** element (e.g. `elt.next("book")`) the
338
339
 
339
340
  `.isLastChild` - **boolean**: `true` if the element is the last child in the parent
340
341
 
341
- `.line` - **integer**: The line of the element (where the closing tag appears) in the XML-File. First line is 1
342
-
343
- `.column` - **integer**: The column of the element (where the closing tag appears) in the XML-File. First column is 1
344
-
345
342
  `.name` - **string**: Name of the element/tag
346
343
 
347
344
  `.tag` - **string**: Synonym for `name`
348
345
 
349
- `.text` - **string**: The text of an element, no matter if given as CDATA entitiy or plain character data node (PCDATA)
346
+ `.text` - **string**: The text of an element, no matter if given as CDATA entity or plain character data node (PCDATA)
350
347
 
351
348
  `.attributes` - **object**: All attributes of the object
352
349
 
@@ -367,7 +364,7 @@ For methods which return a single **Twig** element (e.g. `elt.next("book")`) the
367
364
 
368
365
  ## Limitations
369
366
 
370
- This `xml-twig` module focus on reading a XML files. In principle it would be possible to create a XML file from scratch with the [Twig](./doc/twig.md#Twig) class. However, I think there are better modules available. Create/update/delete methods are rather limited. Perhaps I will add it in later release.
367
+ This `xml-twig` module focus on reading a XML files. In principle it would be possible to create a XML file from scratch with the [Twig](./doc/twig.md#Twig) class. However, I think there are better modules available. Of course, you may run operations like `elt.root().children().push(elt.root().children()[0])`, but I think this is not so handy to use.
371
368
 
372
369
  Accessing Twig-Elements by [XML-Path](https://www.w3.org/TR/xpath/) language is not supported. One reason it, the `Twig` class models more an [Element](https://www.w3schools.com/xml/xml_elements.asp) rather than a [Node](https://www.w3schools.com/xml/dom_nodes.asp) which would be more generic.
373
370