moxml 0.1.8 → 0.1.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (57) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop_todo.yml +22 -39
  3. data/README.adoc +51 -20
  4. data/docs/_config.yml +3 -3
  5. data/docs/_guides/index.adoc +15 -7
  6. data/docs/_guides/modifying-xml.adoc +0 -1
  7. data/docs/_guides/node-api-consistency.adoc +572 -0
  8. data/docs/_guides/parsing-xml.adoc +0 -1
  9. data/docs/_guides/xml-declaration.adoc +450 -0
  10. data/docs/_pages/adapter-compatibility.adoc +1 -1
  11. data/docs/_pages/adapters/headed-ox.adoc +9 -9
  12. data/docs/_pages/adapters/index.adoc +0 -1
  13. data/docs/_pages/adapters/libxml.adoc +1 -2
  14. data/docs/_pages/adapters/nokogiri.adoc +1 -2
  15. data/docs/_pages/adapters/oga.adoc +1 -2
  16. data/docs/_pages/adapters/ox.adoc +2 -1
  17. data/docs/_pages/adapters/rexml.adoc +2 -3
  18. data/docs/_pages/best-practices.adoc +0 -1
  19. data/docs/_pages/compatibility.adoc +0 -1
  20. data/docs/_pages/configuration.adoc +0 -1
  21. data/docs/_pages/error-handling.adoc +0 -1
  22. data/docs/_pages/headed-ox-limitations.adoc +16 -0
  23. data/docs/_pages/installation.adoc +0 -1
  24. data/docs/_pages/node-api-reference.adoc +93 -4
  25. data/docs/_pages/performance.adoc +0 -1
  26. data/docs/_pages/quick-start.adoc +0 -1
  27. data/docs/_pages/thread-safety.adoc +0 -1
  28. data/docs/_references/document-api.adoc +0 -1
  29. data/docs/_tutorials/basic-usage.adoc +0 -1
  30. data/docs/_tutorials/builder-pattern.adoc +0 -1
  31. data/docs/_tutorials/namespace-handling.adoc +0 -1
  32. data/docs/_tutorials/xpath-queries.adoc +0 -1
  33. data/lib/moxml/adapter/customized_rexml/formatter.rb +2 -2
  34. data/lib/moxml/adapter/libxml.rb +34 -4
  35. data/lib/moxml/adapter/nokogiri.rb +50 -2
  36. data/lib/moxml/adapter/oga.rb +80 -3
  37. data/lib/moxml/adapter/ox.rb +70 -7
  38. data/lib/moxml/adapter/rexml.rb +45 -10
  39. data/lib/moxml/attribute.rb +6 -0
  40. data/lib/moxml/context.rb +18 -1
  41. data/lib/moxml/declaration.rb +9 -0
  42. data/lib/moxml/doctype.rb +33 -0
  43. data/lib/moxml/document.rb +14 -0
  44. data/lib/moxml/document_builder.rb +7 -0
  45. data/lib/moxml/element.rb +6 -0
  46. data/lib/moxml/error.rb +5 -5
  47. data/lib/moxml/node.rb +73 -1
  48. data/lib/moxml/processing_instruction.rb +6 -0
  49. data/lib/moxml/version.rb +1 -1
  50. data/lib/moxml/xpath/compiler.rb +2 -0
  51. data/lib/moxml/xpath/errors.rb +1 -1
  52. data/spec/integration/shared_examples/node_wrappers/declaration_behavior.rb +0 -3
  53. data/spec/moxml/declaration_preservation_spec.rb +217 -0
  54. data/spec/moxml/doctype_spec.rb +19 -3
  55. data/spec/performance/memory_usage_spec.rb +3 -2
  56. metadata +5 -3
  57. data/.ruby-version +0 -1
@@ -0,0 +1,450 @@
1
+ = XML Declaration Preservation
2
+ :toc:
3
+ :toclevels: 3
4
+
5
+ == Overview
6
+
7
+ Moxml automatically preserves the presence or absence of XML declarations (`<?xml version="1.0"?>`) when parsing and serializing documents. This ensures round-trip fidelity and compliance with standards that require specific declaration handling.
8
+
9
+ === Why This Matters
10
+
11
+ Some XML use cases require specific declaration handling:
12
+
13
+ * **SVG Files**: Often have no XML declaration
14
+ * **XML Fragments**: Should not have declarations
15
+ * **Standards Compliance**: Some specs prohibit declarations in certain contexts
16
+ * **Round-Trip Fidelity**: Parse → Modify → Serialize should preserve format
17
+
18
+ === Key Features
19
+
20
+ * **Automatic Detection**: Moxml detects whether input had a declaration
21
+ * **Automatic Preservation**: Output matches input format by default
22
+ * **Explicit Override**: Force add or remove declarations when needed
23
+ * **All Adapters**: Works across all 6 XML adapters
24
+
25
+ == Basic Usage
26
+
27
+ === Automatic Preservation
28
+
29
+ Moxml automatically preserves whether input had an XML declaration:
30
+
31
+ [source,ruby]
32
+ ----
33
+ require 'moxml'
34
+
35
+ # Document without declaration
36
+ svg = '<svg xmlns="http://www.w3.org/2000/svg"><rect/></svg>'
37
+ doc = Moxml.new.parse(svg)
38
+ doc.to_xml
39
+ # => "<svg xmlns=\"http://www.w3.org/2000/svg\"><rect/></svg>"
40
+ # ✓ No <?xml...?> added
41
+
42
+ # Document with declaration
43
+ xml = '<?xml version="1.0" encoding="UTF-8"?><root><child/></root>'
44
+ doc = Moxml.new.parse(xml)
45
+ doc.to_xml
46
+ # => "<?xml version=\"1.0\" encoding=\"UTF-8\"?><root><child/></root>"
47
+ # ✓ Declaration preserved
48
+ ----
49
+
50
+ === Checking Declaration Presence
51
+
52
+ Use the `has_xml_declaration` attribute to check if a document has a declaration:
53
+
54
+ [source,ruby]
55
+ ----
56
+ # Document without declaration
57
+ doc = Moxml.new.parse('<root/>')
58
+ doc.has_xml_declaration # => false
59
+
60
+ # Document with declaration
61
+ doc = Moxml.new.parse('<?xml version="1.0"?><root/>')
62
+ doc.has_xml_declaration # => true
63
+ ----
64
+
65
+ == Explicit Control
66
+
67
+ === Forcing Declaration Addition
68
+
69
+ Add a declaration to documents that don't have one:
70
+
71
+ [source,ruby]
72
+ ----
73
+ svg = '<svg><rect/></svg>'
74
+ doc = Moxml.new.parse(svg)
75
+
76
+ # Force add declaration
77
+ output = doc.to_xml(declaration: true)
78
+ # => "<?xml version=\"1.0\" encoding=\"UTF-8\"?><svg><rect/></svg>"
79
+ ----
80
+
81
+ === Removing Declarations
82
+
83
+ Remove declaration from documents that have one:
84
+
85
+ [source,ruby]
86
+ ----
87
+ xml = '<?xml version="1.0"?><root><item/></root>'
88
+ doc = Moxml.new.parse(xml)
89
+
90
+ # Force remove declaration
91
+ output = doc.to_xml(declaration: false)
92
+ # => "<root><item/></root>"
93
+ ----
94
+
95
+ == Use Cases
96
+
97
+ === SVG File Processing
98
+
99
+ SVG files often don't have XML declarations. Moxml preserves this:
100
+
101
+ [source,ruby]
102
+ ----
103
+ # Original SVG without declaration
104
+ svg_content = File.read('image.svg')
105
+ doc = Moxml.new.parse(svg_content)
106
+
107
+ # Modify SVG (add viewBox)
108
+ doc.root['viewBox'] = '0 0 100 100'
109
+
110
+ # Save - no declaration added
111
+ File.write('image.svg', doc.to_xml)
112
+ ----
113
+
114
+ === XML Fragment Generation
115
+
116
+ Create XML fragments without declarations:
117
+
118
+ [source,ruby]
119
+ ----
120
+ context = Moxml.new
121
+ doc = context.create_document
122
+
123
+ # Build fragment
124
+ root = doc.create_element('fragment')
125
+ root << doc.create_element('item')
126
+ doc.root = root
127
+
128
+ # Serialize without declaration (default for built documents)
129
+ doc.to_xml # => "<fragment><item/></fragment>"
130
+ ----
131
+
132
+ === Standards-Compliant XML
133
+
134
+ Some XML standards require or prohibit declarations:
135
+
136
+ [source,ruby]
137
+ ----
138
+ # Standard prohibits declarations
139
+ doc = Moxml.new.parse(compliant_xml_without_decl)
140
+ output = doc.to_xml # Declaration correctly absent
141
+
142
+ # Standard requires declarations
143
+ doc = Moxml.new.parse(standard_xml_with_decl)
144
+ output = doc.to_xml # Declaration correctly present
145
+ ----
146
+
147
+ === Round-Trip Processing
148
+
149
+ Preserve original format through multiple parse/serialize cycles:
150
+
151
+ [source,ruby]
152
+ ----
153
+ original = '<data><item id="1"/></data>'
154
+
155
+ # First round-trip
156
+ doc1 = Moxml.new.parse(original)
157
+ intermediate = doc1.to_xml
158
+
159
+ # Second round-trip
160
+ doc2 = Moxml.new.parse(intermediate)
161
+ final = doc2.to_xml
162
+
163
+ # All three are identical
164
+ original == intermediate && intermediate == final # => true
165
+ ----
166
+
167
+ == Adapter Behavior
168
+
169
+ All 6 adapters support declaration preservation:
170
+
171
+ [cols="1,3"]
172
+ |===
173
+ |Adapter |Implementation
174
+
175
+ |Nokogiri
176
+ |Uses `SaveOptions::NO_DECLARATION` flag
177
+
178
+ |Oga
179
+ |Custom serialization logic
180
+
181
+ |REXML
182
+ |Conditional declaration output
183
+
184
+ |Ox
185
+ |Declaration control in serialize
186
+
187
+ |LibXML
188
+ |Custom serializer respects flag
189
+
190
+ |HeadedOx
191
+ |Inherits Ox implementation
192
+ |===
193
+
194
+ == Advanced Usage
195
+
196
+ === Programmatically Built Documents
197
+
198
+ Documents built from scratch default to no declaration:
199
+
200
+ [source,ruby]
201
+ ----
202
+ doc = Moxml.new.create_document
203
+ root = doc.create_element('config')
204
+ doc.root = root
205
+
206
+ doc.has_xml_declaration # => false
207
+ doc.to_xml # => "<config/>"
208
+
209
+ # Explicitly add declaration if needed
210
+ doc.to_xml(declaration: true)
211
+ # => "<?xml version=\"1.0\" encoding=\"UTF-8\"?><config/>"
212
+ ----
213
+
214
+ === Element Serialization
215
+
216
+ Only document nodes can have declarations. Element serialization never includes declarations:
217
+
218
+ [source,ruby]
219
+ ----
220
+ doc = Moxml.new.parse('<?xml version="1.0"?><root><child/></root>')
221
+ element = doc.root
222
+
223
+ # Element serialization - no declaration
224
+ element.to_xml # => "<root><child/></root>"
225
+
226
+ # Even with explicit request (ignored for elements)
227
+ element.to_xml(declaration: true) # => "<root><child/></root>"
228
+ ----
229
+
230
+ === Custom Declaration Attributes
231
+
232
+ When forcing declaration addition, use standard attributes:
233
+
234
+ [source,ruby]
235
+ ----
236
+ doc = Moxml.new.parse('<root/>')
237
+
238
+ # Default declaration
239
+ doc.to_xml(declaration: true)
240
+ # => "<?xml version=\"1.0\" encoding=\"UTF-8\"?><root/>"
241
+
242
+ # Custom encoding via adapter
243
+ doc.to_xml(declaration: true, encoding: "ISO-8859-1")
244
+ # => "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?><root/>"
245
+ ----
246
+
247
+ == Best Practices
248
+
249
+ === Let Moxml Handle It
250
+
251
+ In most cases, rely on automatic preservation:
252
+
253
+ [source,ruby]
254
+ ----
255
+ # Good - automatic preservation
256
+ doc = Moxml.new.parse(xml_content)
257
+ modified = process_document(doc)
258
+ output = doc.to_xml # Declaration preserved automatically
259
+
260
+ # Only use explicit control when required by specific needs
261
+ output = doc.to_xml(declaration: false) # Explicit requirement
262
+ ----
263
+
264
+ === Check Declaration Before Processing
265
+
266
+ Know what you're working with:
267
+
268
+ [source,ruby]
269
+ ----
270
+ doc = Moxml.new.parse(xml_source)
271
+
272
+ if doc.has_xml_declaration
273
+ # Handle documents with declarations
274
+ process_with_declaration(doc)
275
+ else
276
+ # Handle fragments without declarations
277
+ process_fragment(doc)
278
+ end
279
+ ----
280
+
281
+ === Document Your Requirements
282
+
283
+ Make declaration requirements explicit in code:
284
+
285
+ [source,ruby]
286
+ ----
287
+ def save_svg(doc)
288
+ # SVG files should not have XML declarations
289
+ raise "SVG has declaration" if doc.has_xml_declaration
290
+ File.write('output.svg', doc.to_xml)
291
+ end
292
+
293
+ def save_xml_config(doc)
294
+ # Config files require declarations
295
+ File.write('config.xml', doc.to_xml(declaration: true))
296
+ end
297
+ ----
298
+
299
+ == Migration from Previous Versions
300
+
301
+ === Behavior Change
302
+
303
+ In Moxml versions before 0.2.1, serialization *always* added an XML declaration. Starting with 0.2.1, behavior changed to preserve input format:
304
+
305
+ [cols="1,2,2"]
306
+ |===
307
+ |Scenario |Before v0.2.1 |v0.2.1+
308
+
309
+ |Parse without declaration
310
+ |Added declaration ❌
311
+ |No declaration ✓
312
+
313
+ |Parse with declaration
314
+ |Preserved declaration ✓
315
+ |Preserved declaration ✓
316
+
317
+ |Built document
318
+ |Added declaration
319
+ |No declaration (can override)
320
+ |===
321
+
322
+ === Update Your Code
323
+
324
+ If you relied on automatic declaration addition:
325
+
326
+ [source,ruby]
327
+ ----
328
+ # Before (relied on automatic declaration)
329
+ doc = Moxml.new.parse('<root/>')
330
+ output = doc.to_xml # Had declaration
331
+
332
+ # After (explicitly request if needed)
333
+ doc = Moxml.new.parse('<root/>')
334
+ output = doc.to_xml(declaration: true) # Force add
335
+ ----
336
+
337
+ === Minimal Impact
338
+
339
+ Most code will see **no change** because:
340
+
341
+ * Documents with declarations still preserve them
342
+ * Only fragments without declarations behave differently
343
+ * New behavior is arguably more correct
344
+
345
+ == Troubleshooting
346
+
347
+ === Declaration Not Preserved
348
+
349
+ If declaration isn't being preserved, check:
350
+
351
+ 1. **Input Format**: Verify input actually has `<?xml...?>`
352
+ +
353
+ [source,ruby]
354
+ ----
355
+ xml_content = File.read('file.xml')
356
+ puts "Has declaration: #{xml_content.strip.start_with?('<?xml')}"
357
+ ----
358
+
359
+ 2. **Explicit Override**: Check if code explicitly removes it
360
+ +
361
+ [source,ruby]
362
+ ----
363
+ # This will remove declaration regardless of input
364
+ doc.to_xml(declaration: false)
365
+ ----
366
+
367
+ 3. **Element vs Document**: Only documents can have declarations
368
+ +
369
+ [source,ruby]
370
+ ----
371
+ element = doc.root
372
+ element.to_xml # Never has declaration (correct)
373
+ ----
374
+
375
+ === Unwanted Declaration
376
+
377
+ If declaration is added when you don't want it:
378
+
379
+ [source,ruby]
380
+ ----
381
+ # Solution 1: Parse input without declaration
382
+ svg = '<svg><rect/></svg>' # No <?xml...?>
383
+ doc = Moxml.new.parse(svg)
384
+ doc.to_xml # No declaration
385
+
386
+ # Solution 2: Explicitly remove
387
+ doc.to_xml(declaration: false)
388
+
389
+ # Solution 3: Check and fix source
390
+ if doc.has_xml_declaration
391
+ # Input has declaration - remove from source or override
392
+ output = doc.to_xml(declaration: false)
393
+ end
394
+ ----
395
+
396
+ === Whitespace Before Declaration
397
+
398
+ XML declarations must be at the document start. Whitespace before the declaration makes it invalid:
399
+
400
+ [source,ruby]
401
+ ----
402
+ # Invalid - whitespace before declaration
403
+ invalid = ' <?xml version="1.0"?><root/>'
404
+ doc = Moxml.new.parse(invalid) # May raise error depending on adapter
405
+
406
+ # Valid - declaration at start
407
+ valid = '<?xml version="1.0"?><root/>'
408
+ doc = Moxml.new.parse(valid) # Works correctly
409
+ ----
410
+
411
+ == API Reference
412
+
413
+ === Document Attributes
414
+
415
+ ==== has_xml_declaration
416
+
417
+ [source,ruby]
418
+ ----
419
+ doc.has_xml_declaration # => Boolean
420
+ ----
421
+
422
+ Returns `true` if the document was parsed from XML that contained an XML declaration, `false` otherwise.
423
+
424
+ * Read/write attribute (can be manually set)
425
+ * Defaults to `false` for programmatically built documents
426
+ * Automatically set during parsing
427
+
428
+ === Serialization Options
429
+
430
+ ==== declaration
431
+
432
+ [source,ruby]
433
+ ----
434
+ doc.to_xml(declaration: true) # Force include declaration
435
+ doc.to_xml(declaration: false) # Force exclude declaration
436
+ doc.to_xml # Use automatic preservation
437
+ ----
438
+
439
+ Controls whether XML declaration is included in serialized output:
440
+
441
+ * `true`: Always include declaration
442
+ * `false`: Never include declaration
443
+ * Not specified: Use `has_xml_declaration` value (automatic preservation)
444
+
445
+ == See Also
446
+
447
+ * link:parsing-xml[Parsing XML] - How to parse XML documents
448
+ * link:modifying-xml[Modifying XML] - Working with parsed documents
449
+ * link:../adapters/index[Adapters] - Adapter-specific behavior
450
+ * link:../best-practices[Best Practices] - General XML processing guidelines
@@ -1,4 +1,4 @@
1
- = Moxml adapter compatibility matrix
1
+ = Adapter compatibility matrix
2
2
  :toc:
3
3
  :toc-placement!:
4
4
 
@@ -1,12 +1,12 @@
1
1
  ---
2
- title: HeadedOx adapter
2
+ title: HeadedOx
3
3
  parent: Adapters
4
4
  nav_order: 6
5
5
  ---
6
6
 
7
- === HeadedOx adapter
7
+ == HeadedOx adapter
8
8
 
9
- ==== General
9
+ === General
10
10
 
11
11
  The HeadedOx adapter combines Ox's fast C-based XML parsing with Moxml's
12
12
  comprehensive pure Ruby XPath 1.0 engine.
@@ -39,7 +39,7 @@ cheap = doc.xpath('//book[@price <= sum(//book/@price) div count(//book)]')
39
39
  IMPORTANT: For complete XPath 1.0 specification with zero limitations today, use
40
40
  Nokogiri or Oga adapters.
41
41
 
42
- ==== Features
42
+ === Features
43
43
 
44
44
  * Fast XML parsing (Ox C extension) - Same speed as standard Ox
45
45
  * 6 of 13 XPath axes (46% - covers 80% of common usage patterns)
@@ -48,7 +48,7 @@ Nokogiri or Oga adapters.
48
48
  * Expression compilation and caching (1000-entry LRU cache)
49
49
  * Document construction and serialization through Ox
50
50
 
51
- ==== Architecture
51
+ === Architecture
52
52
 
53
53
  HeadedOx is a **hybrid adapter** that layers Moxml's pure Ruby XPath engine on
54
54
  top of Ox's fast C parser:
@@ -78,7 +78,7 @@ top of Ox's fast C parser:
78
78
  └─────────────────────────┘
79
79
  ----
80
80
 
81
- ==== Known limitations
81
+ === Known limitations
82
82
 
83
83
  The following 16 test failures represent architectural boundaries in the Ox gem,
84
84
  not bugs in HeadedOx:
@@ -100,7 +100,7 @@ See link:docs/HEADED_OX_LIMITATIONS.md[HEADED_OX_LIMITATIONS.md] for:
100
100
  * When to use HeadedOx vs other adapters decision guide
101
101
  * Future roadmap if Ox adds namespace introspection API
102
102
 
103
- ==== When to Use HeadedOx
103
+ === When to Use HeadedOx
104
104
 
105
105
  You can use HeadedOx instead of Ox for all XML parsing needs, except when
106
106
  certain advanced XPath features are required.
@@ -126,7 +126,7 @@ link:docs/headed-ox.adoc[HeadedOx Implementation Guide] and
126
126
  link:docs/HEADED_OX_LIMITATIONS.md[HeadedOx Limitations Documentation].
127
127
 
128
128
 
129
- ==== XPath capabilities
129
+ === XPath capabilities
130
130
 
131
131
  [cols="1,1,4"]
132
132
  |===
@@ -169,7 +169,7 @@ operator predicates, complex nested expressions
169
169
  |===
170
170
 
171
171
 
172
- ==== What XPath queries work in HeadedOx
172
+ === What XPath queries work in HeadedOx
173
173
 
174
174
  NOTE: This table is of v0.2.0.
175
175
 
@@ -1,6 +1,5 @@
1
1
  ---
2
2
  title: Adapters
3
- parent: Overview
4
3
  nav_order: 5
5
4
  has_children: true
6
5
  ---
@@ -1,7 +1,6 @@
1
1
  ---
2
- title: LibXML adapter
2
+ title: LibXML
3
3
  parent: Adapters
4
- grand_parent: Overview
5
4
  nav_order: 2
6
5
  ---
7
6
 
@@ -1,7 +1,6 @@
1
1
  ---
2
- title: Nokogiri adapter
2
+ title: Nokogiri
3
3
  parent: Adapters
4
- grand_parent: Overview
5
4
  nav_order: 1
6
5
  ---
7
6
 
@@ -1,7 +1,6 @@
1
1
  ---
2
- title: Oga adapter
2
+ title: Oga
3
3
  parent: Adapters
4
- grand_parent: Overview
5
4
  nav_order: 3
6
5
  ---
7
6
 
@@ -1,5 +1,5 @@
1
1
  ---
2
- title: Ox adapter
2
+ title: Ox
3
3
  parent: Adapters
4
4
  nav_order: 5
5
5
  ---
@@ -11,6 +11,7 @@ nav_order: 5
11
11
  Ox is the fastest XML parser available for Ruby, providing excellent performance for simple to moderately complex XML documents.
12
12
 
13
13
  **Best for:**
14
+
14
15
  * Maximum parsing speed
15
16
  * Simple document structures
16
17
  * Memory-constrained environments
@@ -1,7 +1,6 @@
1
1
  ---
2
- title: REXML adapter
2
+ title: REXML
3
3
  parent: Adapters
4
- grand_parent: Overview
5
4
  nav_order: 4
6
5
  ---
7
6
 
@@ -284,7 +283,7 @@ end
284
283
  === References
285
284
 
286
285
  * link:https://github.com/ruby/rexml[REXML on GitHub]
287
- * link:https://ruby-doc.org/stdlib/libdoc/rexml/rdoc/REXML.html[REXML documentation]
286
+ * link:https://ruby-doc.org/stdlib/libdoc/rexml/rdoc/REXML[REXML documentation]
288
287
 
289
288
  === See also
290
289
 
@@ -1,6 +1,5 @@
1
1
  ---
2
2
  title: Best practices
3
- parent: Overview
4
3
  nav_order: 9
5
4
  ---
6
5
 
@@ -1,6 +1,5 @@
1
1
  ---
2
2
  title: Compatibility
3
- parent: Overview
4
3
  nav_order: 6
5
4
  ---
6
5
 
@@ -1,6 +1,5 @@
1
1
  ---
2
2
  title: Configuration
3
- parent: Overview
4
3
  nav_order: 7
5
4
  ---
6
5
 
@@ -1,6 +1,5 @@
1
1
  ---
2
2
  title: Error handling
3
- parent: Overview
4
3
  nav_order: 8
5
4
  ---
6
5