xml-sax-ts 0.3.0 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -127,6 +127,120 @@ const xml = serializeXml(
127
127
  // </root>
128
128
  ```
129
129
 
130
+ ## Benchmarking
131
+
132
+ Run the reproducible benchmark harness:
133
+
134
+ ```bash
135
+ npm run bench
136
+ ```
137
+
138
+ Quick run (fewer rounds):
139
+
140
+ ```bash
141
+ npm run bench:quick
142
+ ```
143
+
144
+ The benchmark now runs multiple rounds and reports median/mean/stddev for better comparability.
145
+
146
+ - `xml-sax-ts:sax` scenarios measure streaming event parsing
147
+ - `xml-sax-ts:sax` scenarios include explicit `xmlns=true/false` modes
148
+ - `xml-sax-ts:sax ... no-position` shows upper-bound throughput with `trackPosition: false`
149
+ - `comparable:*` scenarios run minimal equivalent feature sets for fair `xml-sax-ts` vs `saxes` comparison
150
+ - `xml-sax-ts:tree` scenario measures full tree parsing (`parseXmlString`)
151
+ - `sax` and `saxes` scenarios provide common SAX parser comparisons
152
+ - `fast-xml-parser` scenarios measure object parsing on the same input corpus
153
+
154
+ `fast-xml-parser`, `sax`, and `saxes` are included as dev dependencies so comparison is available out of the box.
155
+
156
+ Example output includes a direct ratio line:
157
+
158
+ `Comparable parse ratio (xml-sax-ts:sax vs fast-xml-parser:object): ...x`
159
+
160
+ Note: SAX event parsing and object materialization are not identical workloads. Use the tree scenario for a closer semantic comparison.
161
+
162
+ ### Benchmark Methodology
163
+
164
+ - Benchmark command: `npm run bench`
165
+ - Runtime: Node `v24.7.0`
166
+ - Benchmark config defaults: `BENCH_ROUNDS=5`, `BENCH_MIN_MS=1200`, `BENCH_WARMUP=10`
167
+ - Corpus: repeated fixture corpus (`basic.xml`, `mixed.xml`, `namespaces.xml`) plus an entity-heavy synthetic case
168
+ - Output metric: median ops/s across rounds (with mean and stddev also shown)
169
+
170
+ ### Benchmark Environment
171
+
172
+ - Published sample run device: MacBook Pro M4
173
+ - Memory: 48 GB RAM
174
+ - CPU: 14-core CPU
175
+ - GPU: 20-core GPU
176
+
177
+ GPU is not used by these Node.js parser benchmarks, but listed for full machine disclosure.
178
+
179
+ Latest sample (`npm run bench` defaults, Node `v24.7.0`):
180
+
181
+ | Scenario | Median ops/s |
182
+ | --- | ---: |
183
+ | `xml-sax-ts:sax single-feed xmlns=true` | 15,155.48 |
184
+ | `xml-sax-ts:sax single-feed xmlns=false` | 21,178.68 |
185
+ | `xml-sax-ts:sax single-feed xmlns=false no-position` | 22,230.83 |
186
+ | `sax:single-feed xmlns=false` | 8,357.12 |
187
+ | `saxes:single-feed xmlns=false` | 23,296.03 |
188
+ | `xml-sax-ts:tree parseXmlString` | 8,833.38 |
189
+ | `fast-xml-parser:object parse` | 6,128.40 |
190
+
191
+ Comparable minimal feature scenarios (fair `saxes` parity check):
192
+
193
+ | Scenario | Median ops/s |
194
+ | --- | ---: |
195
+ | `comparable:xml-sax-ts single-feed xmlns=false position=false` | 22,637.22 |
196
+ | `comparable:saxes single-feed xmlns=false position=false` | 23,305.98 |
197
+ | `comparable:xml-sax-ts single-feed xmlns=true position=false` | 16,468.14 |
198
+ | `comparable:saxes single-feed xmlns=true position=false` | 11,868.48 |
199
+
200
+ - `xml-sax-ts:sax (xmlns=false)` vs `sax (xmlns=false)`: `2.534x`
201
+ - `xml-sax-ts:sax (xmlns=true)` vs `sax (xmlns=true)`: `2.989x`
202
+ - `xml-sax-ts:sax (xmlns=false)` vs `saxes (xmlns=false)`: `0.909x`
203
+ - `comparable minimal (xmlns=false, xml-sax-ts vs saxes)`: `0.971x`
204
+ - `comparable minimal (xmlns=true, xml-sax-ts vs saxes)`: `1.388x`
205
+ - `xml-sax-ts:tree` vs `fast-xml-parser:object`: `1.441x`
206
+
207
+ Benchmark visualization (same sample run):
208
+
209
+ ```mermaid
210
+ xychart-beta
211
+ title "SAX Throughput (xmlns=false)"
212
+ x-axis ["xml-sax-ts", "xml-sax-ts no-position", "sax", "saxes"]
213
+ y-axis "ops/s" 0 --> 24000
214
+ bar [21178.68, 22230.83, 8357.12, 23296.03]
215
+ ```
216
+
217
+ ```mermaid
218
+ xychart-beta
219
+ title "Object/Tree Throughput"
220
+ x-axis ["xml-sax-ts tree", "fast-xml-parser object"]
221
+ y-axis "ops/s" 0 --> 9000
222
+ bar [8833.38, 6128.40]
223
+ ```
224
+
225
+ ```mermaid
226
+ xychart-beta
227
+ title "Comparable Minimal (position=false)"
228
+ x-axis ["xml-sax-ts xmlns=false", "saxes xmlns=false", "xml-sax-ts xmlns=true", "saxes xmlns=true"]
229
+ y-axis "ops/s" 0 --> 24000
230
+ bar [22637.22, 23305.98, 16468.14, 11868.48]
231
+ ```
232
+
233
+ Legend: `xml-sax-ts` bars are the first bars in each chart.
234
+
235
+ Best fair-comparison read:
236
+
237
+ - Use `comparable:*` scenarios for `xml-sax-ts` vs `saxes` parity checks.
238
+ - `xml-sax-ts ... no-position` is useful for peak throughput, but not a default-to-default comparison.
239
+
240
+ These values are machine-dependent; rerun on your hardware for release-quality numbers.
241
+
242
+ Current status for this environment: comparable runs show `xml-sax-ts` at `0.971x` of `saxes` on `xmlns=false` and `1.388x` on `xmlns=true`.
243
+
130
244
  ## API
131
245
 
132
246
  ### `XmlSaxParser`
@@ -147,6 +261,8 @@ new XmlSaxParser(options?: ParserOptions)
147
261
  | `xmlns` | `boolean` | `true` | Enable namespace resolution |
148
262
  | `includeNamespaceAttributes` | `boolean` | `false` | Include `xmlns:*` attributes in tag output |
149
263
  | `allowDoctype` | `boolean` | `true` | Allow `<!DOCTYPE …>` declarations |
264
+ | `coalesceText` | `boolean` | `true` | Merge adjacent text callbacks into one event |
265
+ | `trackPosition` | `boolean` | `true` | Track line/column; disable for faster parsing |
150
266
  | `onOpenTag` | `function` | — | Called for each opening / self-closing tag |
151
267
  | `onCloseTag` | `function` | — | Called for each closing tag |
152
268
  | `onText` | `function` | — | Called for text nodes |
@@ -156,6 +272,16 @@ new XmlSaxParser(options?: ParserOptions)
156
272
  | `onDoctype` | `function` | — | Called for DOCTYPE declarations |
157
273
  | `onError` | `function` | — | Called on parse errors |
158
274
 
275
+ By default (`coalesceText: true`), adjacent text chunks are merged and emitted as one `onText` callback per structural boundary. Set `coalesceText: false` to receive text callbacks exactly as chunk boundaries are parsed.
276
+
277
+ `trackPosition` controls line/column tracking for parser errors. When set to `false`, parsing is faster and `XmlSaxError` still reports `offset`, while `line` and `column` are set to `0`.
278
+
279
+ Event payload note (breaking change): with `xmlns: false`, parser events now emit plain-mode tag shapes aligned with `saxes` performance semantics.
280
+
281
+ - `onOpenTag(tag).attributes` values are strings (not `XmlAttribute` objects)
282
+ - `onOpenTag(tag)` and `onCloseTag(tag)` omit `prefix`, `local`, and `uri`
283
+ - With `xmlns: true`, full namespace metadata remains present
284
+
159
285
  ### `parseXmlString(xml, options?)`
160
286
 
161
287
  Convenience function that parses a complete XML string into an `XmlNode` tree using `XmlSaxParser` + `TreeBuilder` internally.
@@ -198,7 +324,7 @@ Builds an `XmlNode` with `buildXmlNode` and serializes it with `serializeXml`.
198
324
  | `textKey` | `string` | `"#text"` | Key used for text nodes |
199
325
  | `stripNamespaces` | `boolean` | `false` | Strip namespace prefixes from names |
200
326
  | `arrayElements` | `Set\<string\> \| (name: string, path: string[]) => boolean` | — | Force specific elements to always be arrays |
201
- | `rootName` | `string` | — | Root element name when object has multiple keys |
327
+ | `rootName` | `string` | — | Root element name when object has multiple keys|
202
328
 
203
329
  ### `serializeXml(node, options?)`
204
330