xml-stream-editor 0.1.1 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +24 -0
- package/README.md +145 -73
- package/dist/element.js +69 -0
- package/dist/index.js +2 -2
- package/dist/markup.js +0 -2
- package/dist/selector.js +60 -0
- package/dist/xml-stream-editor.js +90 -71
- package/package.json +2 -2
- package/src/types.d.ts +30 -5
package/CHANGELOG.md
CHANGED
|
@@ -1,6 +1,30 @@
|
|
|
1
1
|
CHANGELOG
|
|
2
2
|
===
|
|
3
3
|
|
|
4
|
+
0.2.1
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
Validate selector strings passed to `createXMLEditor` (for now, very basic.
|
|
8
|
+
Just making sure there is only one space between element names in each
|
|
9
|
+
selector).
|
|
10
|
+
|
|
11
|
+
Fix issue where in some cases a selectors would match against the
|
|
12
|
+
suffixes/endings of elements, and not always the full element name
|
|
13
|
+
(e.g.,the selector `"steak"` would sometimes match elements like
|
|
14
|
+
`<mistake>`).
|
|
15
|
+
|
|
16
|
+
0.2.0
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
Significantly improve performance of selector matching.
|
|
20
|
+
|
|
21
|
+
Add validation checks for created or modified XML element attribute names.
|
|
22
|
+
|
|
23
|
+
Add config option, currently just with 1. the ability to disable validation
|
|
24
|
+
of outgoing XML, and 2. configuring the "saxes" parser.
|
|
25
|
+
|
|
26
|
+
Hopefully more helpful, additional text in README.md.
|
|
27
|
+
|
|
4
28
|
0.1.1
|
|
5
29
|
---
|
|
6
30
|
|
package/README.md
CHANGED
|
@@ -1,5 +1,4 @@
|
|
|
1
|
-
xml-stream-editor
|
|
2
|
-
===
|
|
1
|
+
# xml-stream-editor
|
|
3
2
|
|
|
4
3
|
Library to edit xml files in a streaming manner. Inspired by
|
|
5
4
|
[xml-stream](https://www.npmjs.com/package/xml-stream), but 1. allows using
|
|
@@ -11,23 +10,85 @@ allows you to modify XML without needing to buffer the XML files in memory.
|
|
|
11
10
|
For small to mid-sized XML files buffering is fine. But when editing very large
|
|
12
11
|
files (e.g., multi-Gb files) buffering can be a problem or an absolute blocker.
|
|
13
12
|
|
|
14
|
-
Usage
|
|
15
|
-
---
|
|
13
|
+
## Usage
|
|
16
14
|
|
|
17
15
|
`xml-stream-editor` is designed to be used with node's stream systems
|
|
18
16
|
by subclassing [`stream.Transform`](https://nodejs.org/api/stream.html#class-streamtransform),
|
|
19
17
|
so it can be used with the [streams promises API](https://nodejs.org/api/stream.html#streams-promises-api)
|
|
20
18
|
and stdlib interfaces like [`stream.pipeline`](https://nodejs.org/api/stream.html#streampipelinestreams-options).
|
|
21
19
|
|
|
22
|
-
The main way to use `xml-stream-editor` is to
|
|
23
|
-
you want to edit using simple declarative selectors (like _very_ simple XPath
|
|
24
|
-
rules or CSS selectors), and 2. write functions to be called with each
|
|
25
|
-
matching XML element in the document. Those functions then either edit and
|
|
26
|
-
return the provided element, or remove the element from the document
|
|
27
|
-
by returning nothing.
|
|
20
|
+
The main way to use `xml-stream-editor` is to:
|
|
28
21
|
|
|
29
|
-
|
|
30
|
-
|
|
22
|
+
1. select which XML elements you want to edit using simple declarative selectors
|
|
23
|
+
(like _very_ simple XPath rules or CSS selectors), and
|
|
24
|
+
2. write functions to be called with each matching XML element in the document.
|
|
25
|
+
Those functions then either edit and return the provided element, or remove
|
|
26
|
+
the element from the document by returning nothing.
|
|
27
|
+
|
|
28
|
+
### Calling xml-stream-editor
|
|
29
|
+
|
|
30
|
+
The main way to call `xml-stream-editor` is by importing `createXMLEditor`,
|
|
31
|
+
passing that function an object, with keys as `selectors` (strings that describe
|
|
32
|
+
which elements to edit) as keys, and values being functions that get passed
|
|
33
|
+
matching elements (to edit to delete those elements).
|
|
34
|
+
|
|
35
|
+
### Elements Selectors
|
|
36
|
+
|
|
37
|
+
You choose which XML elements to edit by writing (simple, limited) CSS-selector
|
|
38
|
+
like statements. For example, the selector `parent child` will match
|
|
39
|
+
all `<child>` elements that are _immediate_ children of `<parent>` nodes.
|
|
40
|
+
**Note**, this is a little different than CSS selectors, where the selector
|
|
41
|
+
`div a` would match `<a>` elements that were were contained in `<div>` elements,
|
|
42
|
+
regardless of whether the `<a>` was an immediate child or more deeply nested.
|
|
43
|
+
|
|
44
|
+
### Editing Elements
|
|
45
|
+
|
|
46
|
+
Each element that matches a given selector is passed to the matching
|
|
47
|
+
function, with the signature `(elm: Element) => Element | undefined`,
|
|
48
|
+
and elements are structured as follows (as typescript):
|
|
49
|
+
|
|
50
|
+
```typescript
|
|
51
|
+
interface Element {
|
|
52
|
+
name: string
|
|
53
|
+
text?: string
|
|
54
|
+
attributes: Record<string, string>
|
|
55
|
+
children: Element[]
|
|
56
|
+
}
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
### Options / Configuration
|
|
60
|
+
|
|
61
|
+
In addition to a `rules` argument, `createReadStream` can also take
|
|
62
|
+
a second `Options` argument. This object has the follow parameters.
|
|
63
|
+
|
|
64
|
+
```typescript
|
|
65
|
+
interface Options {
|
|
66
|
+
// Whether to check and enforce the validity of created and modified
|
|
67
|
+
// XML element names and attributes. If true, will throw an error
|
|
68
|
+
// if you create an XML element with a disallowed name (e.g.,
|
|
69
|
+
// <no spaces allowed>) or with an invalid attribute name
|
|
70
|
+
// (<my-elm a:b:c="too many namespaces" d@y="no @ in attr names">)
|
|
71
|
+
//
|
|
72
|
+
// This only checks the syntax of the XML element names and attributes.
|
|
73
|
+
// It does not perform any further validation, like if used namespaces
|
|
74
|
+
// are valid.
|
|
75
|
+
//
|
|
76
|
+
// default: `true`
|
|
77
|
+
validate: boolean // true
|
|
78
|
+
|
|
79
|
+
// Options defined by the "saxes" library, and passed to the "saxes" parser
|
|
80
|
+
//
|
|
81
|
+
// https://github.com/lddubeau/saxes/blob/4968bd09b5fd0270a989c69913614b0e640dae1b/src/saxes.ts#L557
|
|
82
|
+
// https://www.npmjs.com/package/saxes
|
|
83
|
+
saxes?: SaxesOptions
|
|
84
|
+
}
|
|
85
|
+
|
|
86
|
+
// The createXMLEditor function takes the options object as an optional
|
|
87
|
+
// second argument.
|
|
88
|
+
const transformer = createXMLEditor(rules, options)
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
## Examples
|
|
31
92
|
|
|
32
93
|
Start with this input as `simpsons.xml`:
|
|
33
94
|
|
|
@@ -54,49 +115,53 @@ import { createReadStream } from 'node:fs'
|
|
|
54
115
|
import { pipeline } from 'node:stream/promises'
|
|
55
116
|
import { createXMLEditor, newElement } from 'xml-stream-editor'
|
|
56
117
|
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
118
|
+
// The keys of this object are selector strings, and the
|
|
119
|
+
// values are functions that get called with matching elements.
|
|
120
|
+
const rules = {
|
|
121
|
+
"main character": (elm) => {
|
|
122
|
+
switch (elm.text) {
|
|
123
|
+
case "Marge Simpson":
|
|
124
|
+
elm.attributes["hair"] = "blue"
|
|
125
|
+
break
|
|
126
|
+
case "Homer Simpson":
|
|
127
|
+
elm.text += " (Sr.)"
|
|
128
|
+
break
|
|
129
|
+
case "Lisa Simpson":
|
|
130
|
+
elm.text = ""
|
|
131
|
+
|
|
132
|
+
// Create an <instrument> element and make it a child element.
|
|
133
|
+
const instrumentElm = newElement("instrument")
|
|
134
|
+
instrumentElm.text = "saxophone"
|
|
135
|
+
elm.children.push(instrumentElm)
|
|
136
|
+
|
|
137
|
+
// Also create a new <name> element, and also make it a child
|
|
138
|
+
// element.
|
|
139
|
+
const nameElm = newElement("name")
|
|
140
|
+
nameElm.text = "Lisa Simpson"
|
|
141
|
+
elm.children.push(nameElm)
|
|
142
|
+
break
|
|
143
|
+
case "Bart Simpson":
|
|
144
|
+
// Remove the node by not returning an element.
|
|
145
|
+
return
|
|
84
146
|
}
|
|
147
|
+
return elm
|
|
85
148
|
}
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
149
|
+
}
|
|
150
|
+
await pipeline(
|
|
151
|
+
createReadStream("simpsons.xml"), // above example
|
|
152
|
+
createXMLEditor(rules),
|
|
153
|
+
process.stdout
|
|
154
|
+
)
|
|
92
155
|
```
|
|
93
156
|
|
|
94
|
-
And you'll find this printed to `STDOUT` (reformatted):
|
|
157
|
+
And you'll find this printed to `STDOUT` (reformatted and annotated):
|
|
95
158
|
|
|
96
159
|
```xml
|
|
97
160
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
98
161
|
<simpsons decade="90s" locale="US">
|
|
99
162
|
<main>
|
|
163
|
+
<!-- These character elements were edited because they're
|
|
164
|
+
children of the main element (i.e., "main character"). -->
|
|
100
165
|
<character sex="female" hair="blue">Marge Simpson</character>
|
|
101
166
|
<character sex="male">Homer Simpson (Sr.)</character>
|
|
102
167
|
<character sex="female">
|
|
@@ -104,16 +169,22 @@ And you'll find this printed to `STDOUT` (reformatted):
|
|
|
104
169
|
<name>Lisa Simpson</name>
|
|
105
170
|
</character>
|
|
106
171
|
<character sex="female">Maggie Simpson</character>
|
|
172
|
+
<!-- There is no <character>Bart Simpson</character>
|
|
173
|
+
element anymore because the `case "Bart Simpson":`
|
|
174
|
+
case didn't return an element from the function. -->
|
|
107
175
|
</main>
|
|
108
176
|
<side>
|
|
177
|
+
<!-- These side character elements were not edited of affected
|
|
178
|
+
at all because they didn't match the given selector
|
|
179
|
+
(i.e., they are not "character" elements that are direct
|
|
180
|
+
children of "side" elements). -->
|
|
109
181
|
<character sex="male">Disco Stu</character>
|
|
110
182
|
<character sex="male" title="Dr.">Julius Hibbert</character>
|
|
111
183
|
</side>
|
|
112
184
|
</simpsons>
|
|
113
185
|
```
|
|
114
186
|
|
|
115
|
-
Notes
|
|
116
|
-
---
|
|
187
|
+
## Notes
|
|
117
188
|
|
|
118
189
|
Nested editing functions are not supported. You can define as many editing
|
|
119
190
|
rules as you'd like, but only one rule can be matching the xml document
|
|
@@ -128,33 +199,34 @@ import { createReadStream } from 'node:fs'
|
|
|
128
199
|
import { pipeline } from 'node:stream/promises'
|
|
129
200
|
import { createXMLEditor, newElement } from 'xml-stream-editor'
|
|
130
201
|
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
)
|
|
153
|
-
|
|
202
|
+
const rules = {
|
|
203
|
+
// This rule will match first, since the "main" element will be
|
|
204
|
+
// identified first during parsing.
|
|
205
|
+
"main character": (elm) => {
|
|
206
|
+
// editing goes here
|
|
207
|
+
return elm
|
|
208
|
+
},
|
|
209
|
+
// And as a result, this rule will never match the "Disco Stu"
|
|
210
|
+
// or "Julius Hibbert" elements, since anytime the "character" selector
|
|
211
|
+
// would match a <character> element, that <character> element will
|
|
212
|
+
// have already been matched by the above "main character" selector.
|
|
213
|
+
//
|
|
214
|
+
// However, this selector would match (and so this function would
|
|
215
|
+
// be called with) the two <character> elements that are children
|
|
216
|
+
// of the <side> element.
|
|
217
|
+
"character": (elm) => {
|
|
218
|
+
// this function would never be called in this document.
|
|
219
|
+
return elm
|
|
220
|
+
},
|
|
221
|
+
}
|
|
222
|
+
await pipeline(
|
|
223
|
+
createReadStream("simpsons.xml"), // above example
|
|
224
|
+
createXMLEditor(rules),
|
|
225
|
+
process.stdout
|
|
226
|
+
)
|
|
154
227
|
```
|
|
155
228
|
|
|
156
|
-
Motivation
|
|
157
|
-
---
|
|
229
|
+
## Motivation
|
|
158
230
|
|
|
159
231
|
`xml-stream-editor` was built to handle the extremely large XML files
|
|
160
232
|
generated by [Brave Software's PageGraph system](https://github.com/brave/brave-browser/wiki/PageGraph),
|
package/dist/element.js
ADDED
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
import xnv from 'xml-name-validator';
|
|
2
|
+
const isValidName = xnv.qname;
|
|
3
|
+
export class Element {
|
|
4
|
+
attributes;
|
|
5
|
+
children = [];
|
|
6
|
+
name;
|
|
7
|
+
text;
|
|
8
|
+
constructor(name, attributes) {
|
|
9
|
+
this.name = name;
|
|
10
|
+
this.attributes = attributes
|
|
11
|
+
? JSON.parse(JSON.stringify(attributes))
|
|
12
|
+
: Object.create(null);
|
|
13
|
+
}
|
|
14
|
+
validate() {
|
|
15
|
+
if (typeof this.name !== 'string') {
|
|
16
|
+
return [false, new Error('No name provided for element')];
|
|
17
|
+
}
|
|
18
|
+
if (!isValidName(this.name)) {
|
|
19
|
+
return [false, new Error(`"${this.name}" is not a valid element name`)];
|
|
20
|
+
}
|
|
21
|
+
if (typeof this.attributes !== 'object' || this.attributes === null) {
|
|
22
|
+
return [false, new Error('"attributes" property is not an object')];
|
|
23
|
+
}
|
|
24
|
+
for (const attrName of Object.keys(this.attributes)) {
|
|
25
|
+
if (!isValidName(attrName)) {
|
|
26
|
+
return [false, new Error(`"${attrName}" is not a valid attribute name`)];
|
|
27
|
+
}
|
|
28
|
+
}
|
|
29
|
+
for (const child of this.children) {
|
|
30
|
+
const [isChildValid, childError] = child.validate();
|
|
31
|
+
if (!isChildValid) {
|
|
32
|
+
return [false, childError];
|
|
33
|
+
}
|
|
34
|
+
}
|
|
35
|
+
return [true, undefined];
|
|
36
|
+
}
|
|
37
|
+
}
|
|
38
|
+
export class ParsedElement extends Element {
|
|
39
|
+
children = [];
|
|
40
|
+
static fromSaxesNode(node) {
|
|
41
|
+
// Here we check if each attribute name is simple (and so just a
|
|
42
|
+
// string), or in the namespace representation the "saxes" library
|
|
43
|
+
// uses (in which case attrValue will be a SaxesAttributeNS
|
|
44
|
+
// object, that we have to unpack a bit)
|
|
45
|
+
const attributes = Object.create(null);
|
|
46
|
+
if (node.attributes) {
|
|
47
|
+
for (const [attrName, attrValue] of Object.entries(node.attributes)) {
|
|
48
|
+
if (typeof attrValue === 'string') {
|
|
49
|
+
attributes[attrName] = attrValue;
|
|
50
|
+
continue;
|
|
51
|
+
}
|
|
52
|
+
attributes[attrValue.name] = attrValue.value;
|
|
53
|
+
}
|
|
54
|
+
}
|
|
55
|
+
return new ParsedElement(node.name, attributes);
|
|
56
|
+
}
|
|
57
|
+
clone() {
|
|
58
|
+
const cloneElm = new ParsedElement(this.name, this.attributes);
|
|
59
|
+
cloneElm.text = this.text;
|
|
60
|
+
cloneElm.children = [];
|
|
61
|
+
for (const aChildElm of this.children) {
|
|
62
|
+
cloneElm.children.push(aChildElm.clone());
|
|
63
|
+
}
|
|
64
|
+
return cloneElm;
|
|
65
|
+
}
|
|
66
|
+
}
|
|
67
|
+
export const newElement = (name, attributes) => {
|
|
68
|
+
return new Element(name, attributes);
|
|
69
|
+
};
|
package/dist/index.js
CHANGED
|
@@ -1,2 +1,2 @@
|
|
|
1
|
-
|
|
2
|
-
export { createXMLEditor,
|
|
1
|
+
export { Element, newElement } from './element.js';
|
|
2
|
+
export { createXMLEditor, } from './xml-stream-editor.js';
|
package/dist/markup.js
CHANGED
package/dist/selector.js
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
// Represents the user provided selector strings, for defining which
|
|
2
|
+
// XML elements in the XML document they want to edit.
|
|
3
|
+
//
|
|
4
|
+
// We modify the (simplified) XML paths used to i. allow user to define
|
|
5
|
+
// which XML elements they want to edit, and ii. track the position of
|
|
6
|
+
// each parsed XML element in the incoming XML document.
|
|
7
|
+
//
|
|
8
|
+
// This allows us to quickly check whether a user-provided "selector"
|
|
9
|
+
// string matches the current XML parse stack with a simple .endsWith()
|
|
10
|
+
// call (specifically pathToJustParsedXMLElement.endsWith(userProvidedSelector).
|
|
11
|
+
import xnv from 'xml-name-validator';
|
|
12
|
+
// Single character string that cannot appear in XML element names.
|
|
13
|
+
const pathSeparator = '@';
|
|
14
|
+
const process = (elementPath) => {
|
|
15
|
+
const collapsedWhiteSpace = elementPath.trim().replace(/ +/g, ' ');
|
|
16
|
+
return collapsedWhiteSpace.split(' ').map(x => pathSeparator + x).join('');
|
|
17
|
+
};
|
|
18
|
+
const validate = (selector) => {
|
|
19
|
+
for (const elmName of selector.split(' ')) {
|
|
20
|
+
if (xnv.name(elmName) === true) {
|
|
21
|
+
continue;
|
|
22
|
+
}
|
|
23
|
+
const msg = `Selector "${selector}" contains invalid name "${elmName}"`;
|
|
24
|
+
return [false, new Error(msg)];
|
|
25
|
+
}
|
|
26
|
+
return [true, undefined];
|
|
27
|
+
};
|
|
28
|
+
// Simple class used for tracking the path to an element in an XML document,
|
|
29
|
+
// when parsing the XML document.
|
|
30
|
+
//
|
|
31
|
+
// Mostly this is just wrapping how we track the position of each element
|
|
32
|
+
// in the XML document as we're parsing it, and annotating that path
|
|
33
|
+
// in a way that makes it easy to check if a SelectorRule matches the
|
|
34
|
+
// leaf-element in that path.
|
|
35
|
+
export class ElementPath {
|
|
36
|
+
path;
|
|
37
|
+
pathForMatching;
|
|
38
|
+
constructor(path) {
|
|
39
|
+
this.path = path;
|
|
40
|
+
this.pathForMatching = process(path);
|
|
41
|
+
}
|
|
42
|
+
append(elmName) {
|
|
43
|
+
return new ElementPath(this.path + ' ' + elmName);
|
|
44
|
+
}
|
|
45
|
+
matches(selector) {
|
|
46
|
+
return this.pathForMatching.endsWith(selector.text);
|
|
47
|
+
}
|
|
48
|
+
}
|
|
49
|
+
export class SelectorRule {
|
|
50
|
+
text;
|
|
51
|
+
pathForMatching;
|
|
52
|
+
constructor(selector) {
|
|
53
|
+
const [isValid, err] = validate(selector);
|
|
54
|
+
if (!isValid) {
|
|
55
|
+
throw err;
|
|
56
|
+
}
|
|
57
|
+
this.text = process(selector);
|
|
58
|
+
this.pathForMatching = process(this.text);
|
|
59
|
+
}
|
|
60
|
+
}
|
|
@@ -1,56 +1,25 @@
|
|
|
1
1
|
import { strict as assert } from 'node:assert';
|
|
2
2
|
import { Transform } from 'node:stream';
|
|
3
3
|
import { SaxesParser } from 'saxes';
|
|
4
|
-
import {
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
throw new Error(`"${name}" is not a valid XML element name`);
|
|
8
|
-
}
|
|
9
|
-
return {
|
|
10
|
-
name: name,
|
|
11
|
-
text: undefined,
|
|
12
|
-
attributes: {},
|
|
13
|
-
children: [],
|
|
14
|
-
};
|
|
15
|
-
};
|
|
16
|
-
const cloneElement = (elm) => {
|
|
17
|
-
const newElm = newElement(elm.name);
|
|
18
|
-
newElm.text = elm.text;
|
|
19
|
-
newElm.attributes = JSON.parse(JSON.stringify(elm.attributes));
|
|
20
|
-
newElm.children = elm.children.map(cloneElement);
|
|
21
|
-
return newElm;
|
|
22
|
-
};
|
|
23
|
-
const elementForNode = (node) => {
|
|
24
|
-
// Here we check if each attribute name is simple (and so just a
|
|
25
|
-
// string), or in the namespace representation the "saxes" library
|
|
26
|
-
// uses (in which case attrValue will be a SaxesAttributeNS
|
|
27
|
-
// object, that we have to unpack a bit)
|
|
28
|
-
const attributes = {};
|
|
29
|
-
if (node.attributes) {
|
|
30
|
-
for (const [attrName, attrValue] of Object.entries(node.attributes)) {
|
|
31
|
-
if (typeof attrValue === 'string') {
|
|
32
|
-
attributes[attrName] = attrValue;
|
|
33
|
-
continue;
|
|
34
|
-
}
|
|
35
|
-
attributes[attrValue.name] = attrValue.value;
|
|
36
|
-
}
|
|
37
|
-
}
|
|
38
|
-
return elementForNameAndAttrs(node.name, attributes);
|
|
39
|
-
};
|
|
40
|
-
const elementForNameAndAttrs = (name, attrs) => {
|
|
41
|
-
const newElm = newElement(name);
|
|
42
|
-
if (attrs) {
|
|
43
|
-
newElm.attributes = attrs;
|
|
44
|
-
}
|
|
45
|
-
return newElm;
|
|
46
|
-
};
|
|
4
|
+
import { ParsedElement } from './element.js';
|
|
5
|
+
import { toAttrValue, toBodyText, toCloseTag, toOpenTag } from './markup.js';
|
|
6
|
+
import { ElementPath, SelectorRule } from './selector.js';
|
|
47
7
|
class XMLStreamEditorTransformer extends Transform {
|
|
8
|
+
// Default options, used if the caller doesn't provide any options (or
|
|
9
|
+
// merged into the provided options if the user only sets some options).
|
|
10
|
+
static defaultOptions = {
|
|
11
|
+
validate: true,
|
|
12
|
+
saxes: undefined,
|
|
13
|
+
};
|
|
14
|
+
// The configuration options, including possible options to pass to
|
|
15
|
+
// the (above) saxes parser at instantiation.
|
|
16
|
+
#options;
|
|
48
17
|
// Used to track how deep in the XML tree the parser is, so that we can
|
|
49
18
|
// check newly parsed elements against the passed editor rules.
|
|
50
|
-
#
|
|
51
|
-
// This is a map of
|
|
52
|
-
// no attributes, no name spaces, etc).
|
|
53
|
-
#
|
|
19
|
+
#parseStack = [];
|
|
20
|
+
// This is a map of objects that represent simple xpaths (i.e., only XML
|
|
21
|
+
// element names (no attributes, no name spaces, etc).
|
|
22
|
+
#rules;
|
|
54
23
|
// Handle to the 'saxes' xml parser object.
|
|
55
24
|
#xmlParser;
|
|
56
25
|
// If set, tracks the current element in the parser stack that matches
|
|
@@ -60,18 +29,35 @@ class XMLStreamEditorTransformer extends Transform {
|
|
|
60
29
|
// Store any errors we've been passed by the saxes parser so that we
|
|
61
30
|
// can pass it along in the transformer callback next time we get data.
|
|
62
31
|
#error;
|
|
32
|
+
#pushParsedElementToStack(element) {
|
|
33
|
+
const topOfStackElm = this.#parseStack.at(-1);
|
|
34
|
+
// We prefix every element name in the parse stack with '@' (a character
|
|
35
|
+
// that isn't valid in an XML element name) so that we can easily
|
|
36
|
+
// check if a selector matches the parse stack by just checking if
|
|
37
|
+
// the selector matches right end of the stack path.
|
|
38
|
+
const pathToElement = topOfStackElm
|
|
39
|
+
? topOfStackElm.path.append(element.name)
|
|
40
|
+
: new ElementPath(element.name);
|
|
41
|
+
this.#parseStack.push({
|
|
42
|
+
element: element,
|
|
43
|
+
path: pathToElement,
|
|
44
|
+
});
|
|
45
|
+
}
|
|
63
46
|
// Checks to see if the current editor stack (which tracks the current
|
|
64
47
|
// element being parsed in the input XML stream, along with its parent
|
|
65
48
|
// elements) matches any of the passed editor rules.
|
|
66
|
-
#
|
|
67
|
-
const
|
|
68
|
-
|
|
69
|
-
|
|
49
|
+
#doesStackMatchEditingRule() {
|
|
50
|
+
const topOfStack = this.#parseStack.at(-1);
|
|
51
|
+
// This method is only called after pushing an element to the stack,
|
|
52
|
+
// so this is guaranteed to be true
|
|
53
|
+
assert(topOfStack);
|
|
54
|
+
for (const [selectorRule, editorFunc] of this.#rules.entries()) {
|
|
55
|
+
if (topOfStack.path.matches(selectorRule)) {
|
|
70
56
|
// The depth of the root of this subtree in the stack
|
|
71
|
-
const depth = this.#
|
|
57
|
+
const depth = this.#parseStack.length - 1;
|
|
72
58
|
assert(depth >= 0);
|
|
73
|
-
const elmToEdit = this.#
|
|
74
|
-
return { selector:
|
|
59
|
+
const elmToEdit = this.#parseStack[depth].element;
|
|
60
|
+
return { selector: selectorRule, func: editorFunc, element: elmToEdit };
|
|
75
61
|
}
|
|
76
62
|
}
|
|
77
63
|
return null;
|
|
@@ -89,12 +75,19 @@ class XMLStreamEditorTransformer extends Transform {
|
|
|
89
75
|
}
|
|
90
76
|
this.push(toCloseTag(element.name));
|
|
91
77
|
}
|
|
92
|
-
#
|
|
78
|
+
#callUserFuncOnCompletedElementAndWriteToStream() {
|
|
93
79
|
assert(this.#elmToEditInfo);
|
|
94
|
-
const clonedElm =
|
|
80
|
+
const clonedElm = this.#elmToEditInfo.element.clone();
|
|
81
|
+
const editElmFunc = this.#elmToEditInfo.func;
|
|
95
82
|
try {
|
|
96
|
-
const editedElm =
|
|
83
|
+
const editedElm = editElmFunc(clonedElm);
|
|
97
84
|
if (editedElm) {
|
|
85
|
+
if (this.#options.validate === true) {
|
|
86
|
+
const [isValid, error] = editedElm.validate();
|
|
87
|
+
if (!isValid) {
|
|
88
|
+
throw error;
|
|
89
|
+
}
|
|
90
|
+
}
|
|
98
91
|
this.#writeElementToStream(editedElm);
|
|
99
92
|
}
|
|
100
93
|
this.#elmToEditInfo = undefined;
|
|
@@ -115,14 +108,14 @@ class XMLStreamEditorTransformer extends Transform {
|
|
|
115
108
|
// and append ourselves to the stack.
|
|
116
109
|
// 3. We are NOT the root of a subtree to be edited, in which case
|
|
117
110
|
// we just add ourselves to the stack.
|
|
118
|
-
const newElement =
|
|
119
|
-
this.#
|
|
111
|
+
const newElement = ParsedElement.fromSaxesNode(node);
|
|
112
|
+
this.#pushParsedElementToStack(newElement);
|
|
120
113
|
// Check for case one
|
|
121
114
|
if (this.#isInSubtreeToBeEdited()) {
|
|
122
115
|
return;
|
|
123
116
|
}
|
|
124
117
|
// Check for case two, if we're at the root of a subtree to edit.
|
|
125
|
-
const matchingElementInfo = this.#
|
|
118
|
+
const matchingElementInfo = this.#doesStackMatchEditingRule();
|
|
126
119
|
if (matchingElementInfo !== null) {
|
|
127
120
|
this.#elmToEditInfo = matchingElementInfo;
|
|
128
121
|
return;
|
|
@@ -140,9 +133,9 @@ class XMLStreamEditorTransformer extends Transform {
|
|
|
140
133
|
// print the text out immediately.
|
|
141
134
|
// Check for case one
|
|
142
135
|
if (this.#isInSubtreeToBeEdited()) {
|
|
143
|
-
const topOfStack = this.#
|
|
136
|
+
const topOfStack = this.#parseStack.at(-1);
|
|
144
137
|
assert(topOfStack);
|
|
145
|
-
topOfStack.text = text;
|
|
138
|
+
topOfStack.element.text = text;
|
|
146
139
|
return;
|
|
147
140
|
}
|
|
148
141
|
// Otherwise we're in case two, and can print the text out immediately.
|
|
@@ -176,36 +169,58 @@ class XMLStreamEditorTransformer extends Transform {
|
|
|
176
169
|
// 3. We've completed a CHILD NODE in a subtree being edited,
|
|
177
170
|
// in which case we append this node to our buffered subtree
|
|
178
171
|
// and pop it off the stack.
|
|
179
|
-
const
|
|
172
|
+
const completedStackElement = this.#parseStack.pop();
|
|
173
|
+
const completedElm = completedStackElement?.element;
|
|
180
174
|
assert(completedElm);
|
|
181
175
|
// Check for case one
|
|
182
176
|
if (this.#isInSubtreeToBeEdited() === false) {
|
|
177
|
+
// Write the closing tag of the just-completed element
|
|
178
|
+
// to the write stream.
|
|
183
179
|
this.push(toCloseTag(node.name));
|
|
184
180
|
return;
|
|
185
181
|
}
|
|
186
182
|
// Check for case two
|
|
187
183
|
assert(this.#elmToEditInfo);
|
|
188
184
|
if (completedElm === this.#elmToEditInfo.element) {
|
|
189
|
-
this.#
|
|
185
|
+
this.#callUserFuncOnCompletedElementAndWriteToStream();
|
|
190
186
|
return;
|
|
191
187
|
}
|
|
192
188
|
// Otherwise, we must be in case three
|
|
193
|
-
|
|
194
|
-
|
|
189
|
+
const topOfStack = this.#parseStack.at(-1);
|
|
190
|
+
assert(topOfStack);
|
|
191
|
+
topOfStack.element.children.push(completedElm);
|
|
195
192
|
});
|
|
196
193
|
}
|
|
197
|
-
constructor(
|
|
194
|
+
constructor(editingRules, options) {
|
|
198
195
|
super();
|
|
199
|
-
|
|
200
|
-
|
|
196
|
+
const defaultOptions = XMLStreamEditorTransformer.defaultOptions;
|
|
197
|
+
const mergedOptions = {
|
|
198
|
+
validate: options?.validate ?? defaultOptions.validate,
|
|
199
|
+
saxes: options?.saxes ?? defaultOptions.saxes,
|
|
200
|
+
};
|
|
201
|
+
this.#options = mergedOptions;
|
|
202
|
+
this.#rules = new Map();
|
|
203
|
+
for (const [selector, editFunc] of Object.entries(editingRules)) {
|
|
204
|
+
// This will throw if one of the user-provided selectors
|
|
205
|
+
// is invalid.
|
|
206
|
+
const parsedSelector = new SelectorRule(selector);
|
|
207
|
+
this.#rules.set(parsedSelector, editFunc);
|
|
208
|
+
}
|
|
209
|
+
this.#xmlParser = new SaxesParser(this.#options.saxes);
|
|
201
210
|
this.#configureParserCallbacks();
|
|
202
211
|
}
|
|
203
212
|
_transform(chunk, encoding, callback) {
|
|
213
|
+
// Don't do any parsing if something threw an error parsing the previous
|
|
214
|
+
// chunk.
|
|
204
215
|
if (this.#error) {
|
|
205
216
|
callback(this.#error);
|
|
206
217
|
return;
|
|
207
218
|
}
|
|
208
219
|
this.#xmlParser.write(chunk);
|
|
220
|
+
// And, similarly, don't continuing parsing if we've caught any errors
|
|
221
|
+
// parsing the current chunk. This looks a little redundant, but because
|
|
222
|
+
// the XML from the input stream is parsed asynchronously, this is
|
|
223
|
+
// just an attempt to catch and handle an error as quickly as possible.
|
|
209
224
|
if (this.#error) {
|
|
210
225
|
callback(this.#error);
|
|
211
226
|
return;
|
|
@@ -213,6 +228,10 @@ class XMLStreamEditorTransformer extends Transform {
|
|
|
213
228
|
callback();
|
|
214
229
|
}
|
|
215
230
|
}
|
|
216
|
-
|
|
217
|
-
|
|
231
|
+
// This is the entry point to the library, and is designed / named
|
|
232
|
+
// to mirror the naming of transformers in the standard lib
|
|
233
|
+
// (e.g., createGzip , createDeflate, etc in the stdlib zlib module,
|
|
234
|
+
// or createHmac, createECDH, etc in the stdlib crypto module).
|
|
235
|
+
export const createXMLEditor = (rules, options) => {
|
|
236
|
+
return new XMLStreamEditorTransformer(rules, options);
|
|
218
237
|
};
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "xml-stream-editor",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.2.1",
|
|
4
4
|
"description": "A streaming xml editor.",
|
|
5
5
|
"main": "dist/index.js",
|
|
6
6
|
"files": [
|
|
@@ -18,7 +18,7 @@
|
|
|
18
18
|
"type": "module",
|
|
19
19
|
"types": "src/types.d.ts",
|
|
20
20
|
"repository": {
|
|
21
|
-
"url": "https://github.com/pes10k/xml-stream-editor.git"
|
|
21
|
+
"url": "git+https://github.com/pes10k/xml-stream-editor.git"
|
|
22
22
|
},
|
|
23
23
|
"keywords": [
|
|
24
24
|
"xml",
|
package/src/types.d.ts
CHANGED
|
@@ -2,16 +2,41 @@ import { Transform } from 'node:stream'
|
|
|
2
2
|
|
|
3
3
|
import { SaxesOptions } from 'saxes'
|
|
4
4
|
|
|
5
|
-
export declare
|
|
6
|
-
name: string
|
|
7
|
-
text?: string
|
|
5
|
+
export declare class Element {
|
|
6
|
+
constructor (name: string, attributes?: Record<string, string>)
|
|
8
7
|
attributes: Record<string, string>
|
|
9
8
|
children: Element[]
|
|
9
|
+
name: string
|
|
10
|
+
text?: string
|
|
11
|
+
}
|
|
12
|
+
|
|
13
|
+
export declare interface Options {
|
|
14
|
+
// Whether to check and enforce the validity of created and modified
|
|
15
|
+
// XML element names and attributes. If true, will throw an error
|
|
16
|
+
// if you create an XML element with a disallowed name (e.g.,
|
|
17
|
+
// <no spaces allowed>) or with an invalid attribute name
|
|
18
|
+
// (<my-elm a:b:c="too many namespaces" d@y="no @ in attr names">)
|
|
19
|
+
//
|
|
20
|
+
// This only checks the syntax of the XML element names and attributes.
|
|
21
|
+
// It does not perform any further validation, like if used namespaces
|
|
22
|
+
// are valid.
|
|
23
|
+
//
|
|
24
|
+
// default: `true`
|
|
25
|
+
validate: boolean // true
|
|
26
|
+
|
|
27
|
+
// Options defined by the "saxes" library, and passed to the "saxes" parser
|
|
28
|
+
//
|
|
29
|
+
// eslint-disable-next-line max-len
|
|
30
|
+
// https://github.com/lddubeau/saxes/blob/4968bd09b5fd0270a989c69913614b0e640dae1b/src/saxes.ts#L557
|
|
31
|
+
// https://www.npmjs.com/package/saxes
|
|
32
|
+
saxes?: SaxesOptions
|
|
10
33
|
}
|
|
11
34
|
|
|
12
35
|
export type Selector = string
|
|
13
36
|
export type EditorFunc = (elm: Element) => Element | undefined
|
|
14
|
-
export type
|
|
37
|
+
export type EditingRules = Record<Selector, EditorFunc>
|
|
38
|
+
// Just wrapper for `new Element(name)`, mostly a remnant of a previous
|
|
39
|
+
// implementation approach.
|
|
15
40
|
export declare const newElement: (name: string) => Element
|
|
16
41
|
export declare const createXMLEditor: (
|
|
17
|
-
|
|
42
|
+
editingRules: EditingRules, options?: Options) => Transform
|