moxml 0.1.9 → 0.1.11
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/workflows/docs.yml +1 -1
- data/.github/workflows/rake.yml +16 -13
- data/.github/workflows/release.yml +1 -0
- data/.github/workflows/round-trip.yml +74 -0
- data/.gitignore +1 -0
- data/.rubocop.yml +1 -0
- data/.rubocop_todo.yml +160 -38
- data/Gemfile +2 -1
- data/README.adoc +287 -20
- data/Rakefile +11 -0
- data/data/w3c_entities.json +2131 -0
- data/docs/ENTITY_SUPPORT_FOR_LUTAML_MODEL.md +102 -0
- data/docs/_guides/index.adoc +14 -12
- data/docs/_guides/node-api-consistency.adoc +572 -0
- data/docs/_guides/xml-declaration.adoc +5 -5
- data/docs/_pages/adapters/ox.adoc +30 -0
- data/docs/_pages/adapters/rexml.adoc +1 -1
- data/docs/_pages/configuration.adoc +43 -0
- data/docs/_pages/node-api-reference.adoc +128 -3
- data/docs/_tutorials/namespace-handling.adoc +21 -0
- data/examples/rss_parser/rss_parser.rb +1 -3
- data/lib/moxml/adapter/base.rb +26 -2
- data/lib/moxml/adapter/headed_ox.rb +5 -4
- data/lib/moxml/adapter/libxml.rb +18 -3
- data/lib/moxml/adapter/nokogiri.rb +26 -2
- data/lib/moxml/adapter/oga.rb +137 -20
- data/lib/moxml/adapter/ox.rb +29 -3
- data/lib/moxml/adapter/rexml.rb +54 -7
- data/lib/moxml/attribute.rb +6 -0
- data/lib/moxml/builder.rb +6 -0
- data/lib/moxml/config.rb +52 -1
- data/lib/moxml/context.rb +21 -2
- data/lib/moxml/doctype.rb +33 -0
- data/lib/moxml/document.rb +6 -1
- data/lib/moxml/document_builder.rb +45 -1
- data/lib/moxml/element.rb +10 -3
- data/lib/moxml/entity_reference.rb +29 -0
- data/lib/moxml/entity_registry.rb +278 -0
- data/lib/moxml/error.rb +5 -5
- data/lib/moxml/node.rb +22 -8
- data/lib/moxml/node_set.rb +10 -6
- data/lib/moxml/processing_instruction.rb +6 -0
- data/lib/moxml/version.rb +1 -1
- data/lib/moxml/xml_utils.rb +25 -2
- data/lib/moxml/xpath/errors.rb +1 -1
- data/lib/moxml.rb +1 -0
- data/spec/consistency/README.md +3 -1
- data/spec/consistency/round_trip_spec.rb +479 -0
- data/spec/examples/readme_examples_spec.rb +1 -1
- data/spec/fixtures/round-trips/metanorma/a.xml +66 -0
- data/spec/fixtures/round-trips/metanorma/bilingual-en.xml +7682 -0
- data/spec/fixtures/round-trips/metanorma/bilingual-fr.xml +7520 -0
- data/spec/fixtures/round-trips/metanorma/bilingual.presentation.xml +21211 -0
- data/spec/fixtures/round-trips/metanorma/collection1.xml +313 -0
- data/spec/fixtures/round-trips/metanorma/collection1nested.xml +291 -0
- data/spec/fixtures/round-trips/metanorma/collection_docinline.xml +544 -0
- data/spec/fixtures/round-trips/metanorma/collection_full.xml +1776 -0
- data/spec/fixtures/round-trips/metanorma/dummy.1.xml +295 -0
- data/spec/fixtures/round-trips/metanorma/dummy.xml +349 -0
- data/spec/fixtures/round-trips/metanorma/footnotes.xml +70 -0
- data/spec/fixtures/round-trips/metanorma/iho.xml +116 -0
- data/spec/fixtures/round-trips/metanorma/rice-amd.final.xml +186 -0
- data/spec/fixtures/round-trips/metanorma/rice-amd.final_1.xml +180 -0
- data/spec/fixtures/round-trips/metanorma/rice-en.final.norepo.xml +116 -0
- data/spec/fixtures/round-trips/metanorma/rice-en.final.xml +149 -0
- data/spec/fixtures/round-trips/metanorma/rice-en.final_1.xml +144 -0
- data/spec/fixtures/round-trips/metanorma/rice1-en.final.xml +120 -0
- data/spec/fixtures/round-trips/metanorma/rice2-en.final.xml +116 -0
- data/spec/fixtures/round-trips/metanorma/test_sectionsplit.xml +119 -0
- data/spec/fixtures/round-trips/niso-jats/bmj_sample.xml +1068 -0
- data/spec/fixtures/round-trips/niso-jats/element_citation.xml +7 -0
- data/spec/fixtures/round-trips/niso-jats/pnas_sample.xml +3768 -0
- data/spec/fixtures/round-trips/rfcxml/rfc8881.xml +45848 -0
- data/spec/fixtures/round-trips/rfcxml/rfc8994.xml +6607 -0
- data/spec/fixtures/round-trips/rfcxml/rfc9000.xml +9064 -0
- data/spec/fixtures/round-trips/rfcxml/rfc9043.xml +5527 -0
- data/spec/fixtures/round-trips/rfcxml/rfc9051.xml +14286 -0
- data/spec/fixtures/round-trips/rfcxml/rfc9110.xml +18156 -0
- data/spec/fixtures/round-trips/rfcxml/rfc9260.xml +9136 -0
- data/spec/fixtures/round-trips/rfcxml/rfc9293.xml +8300 -0
- data/spec/fixtures/round-trips/rfcxml/rfc9380.xml +8916 -0
- data/spec/fixtures/round-trips/rfcxml/rfc9420.xml +8927 -0
- data/spec/fixtures/w3c/namespaces/1.0/001.xml +7 -0
- data/spec/fixtures/w3c/namespaces/1.0/002.xml +8 -0
- data/spec/fixtures/w3c/namespaces/1.0/003.xml +7 -0
- data/spec/fixtures/w3c/namespaces/1.0/004.xml +7 -0
- data/spec/fixtures/w3c/namespaces/1.0/005.xml +7 -0
- data/spec/fixtures/w3c/namespaces/1.0/006.xml +7 -0
- data/spec/fixtures/w3c/namespaces/1.0/007.xml +20 -0
- data/spec/fixtures/w3c/namespaces/1.0/008.xml +20 -0
- data/spec/fixtures/w3c/namespaces/1.0/009.xml +19 -0
- data/spec/fixtures/w3c/namespaces/1.0/010.xml +19 -0
- data/spec/fixtures/w3c/namespaces/1.0/011.xml +20 -0
- data/spec/fixtures/w3c/namespaces/1.0/012.xml +19 -0
- data/spec/fixtures/w3c/namespaces/1.0/013.xml +5 -0
- data/spec/fixtures/w3c/namespaces/1.0/014.xml +3 -0
- data/spec/fixtures/w3c/namespaces/1.0/015.xml +3 -0
- data/spec/fixtures/w3c/namespaces/1.0/016.xml +3 -0
- data/spec/fixtures/w3c/namespaces/1.0/017.xml +3 -0
- data/spec/fixtures/w3c/namespaces/1.0/018.xml +3 -0
- data/spec/fixtures/w3c/namespaces/1.0/019.xml +3 -0
- data/spec/fixtures/w3c/namespaces/1.0/020.xml +3 -0
- data/spec/fixtures/w3c/namespaces/1.0/021.xml +6 -0
- data/spec/fixtures/w3c/namespaces/1.0/022.xml +6 -0
- data/spec/fixtures/w3c/namespaces/1.0/023.xml +6 -0
- data/spec/fixtures/w3c/namespaces/1.0/024.xml +6 -0
- data/spec/fixtures/w3c/namespaces/1.0/025.xml +3 -0
- data/spec/fixtures/w3c/namespaces/1.0/026.xml +3 -0
- data/spec/fixtures/w3c/namespaces/1.0/027.xml +3 -0
- data/spec/fixtures/w3c/namespaces/1.0/028.xml +3 -0
- data/spec/fixtures/w3c/namespaces/1.0/029.xml +4 -0
- data/spec/fixtures/w3c/namespaces/1.0/030.xml +4 -0
- data/spec/fixtures/w3c/namespaces/1.0/031.xml +4 -0
- data/spec/fixtures/w3c/namespaces/1.0/032.xml +5 -0
- data/spec/fixtures/w3c/namespaces/1.0/033.xml +4 -0
- data/spec/fixtures/w3c/namespaces/1.0/034.xml +3 -0
- data/spec/fixtures/w3c/namespaces/1.0/035.xml +8 -0
- data/spec/fixtures/w3c/namespaces/1.0/036.xml +8 -0
- data/spec/fixtures/w3c/namespaces/1.0/037.xml +8 -0
- data/spec/fixtures/w3c/namespaces/1.0/038.xml +8 -0
- data/spec/fixtures/w3c/namespaces/1.0/039.xml +10 -0
- data/spec/fixtures/w3c/namespaces/1.0/040.xml +9 -0
- data/spec/fixtures/w3c/namespaces/1.0/041.xml +8 -0
- data/spec/fixtures/w3c/namespaces/1.0/042.xml +4 -0
- data/spec/fixtures/w3c/namespaces/1.0/043.xml +7 -0
- data/spec/fixtures/w3c/namespaces/1.0/044.xml +7 -0
- data/spec/fixtures/w3c/namespaces/1.0/045.xml +7 -0
- data/spec/fixtures/w3c/namespaces/1.0/046.xml +10 -0
- data/spec/fixtures/w3c/namespaces/1.0/047.xml +4 -0
- data/spec/fixtures/w3c/namespaces/1.0/048.xml +5 -0
- data/spec/fixtures/w3c/namespaces/1.0/LICENSE.md +32 -0
- data/spec/fixtures/w3c/namespaces/1.0/README.adoc +42 -0
- data/spec/fixtures/w3c/namespaces/1.0/rmt-ns10.xml +156 -0
- data/spec/integration/shared_examples/node_wrappers/namespace_behavior.rb +14 -2
- data/spec/integration/shared_examples/w3c_namespace_examples.rb +10 -0
- data/spec/integration/w3c_namespace_spec.rb +69 -0
- data/spec/moxml/adapter/libxml_spec.rb +7 -1
- data/spec/moxml/adapter/oga_spec.rb +92 -0
- data/spec/moxml/config_spec.rb +75 -0
- data/spec/moxml/doctype_spec.rb +19 -3
- data/spec/moxml/entity_registry_spec.rb +184 -0
- data/spec/moxml/error_spec.rb +2 -2
- data/spec/moxml/namespace_uri_validation_spec.rb +140 -0
- data/spec/moxml/xpath/axes_spec.rb +3 -4
- data/spec/performance/xpath_benchmark_spec.rb +6 -54
- data/spec/support/w3c_namespace_helpers.rb +41 -0
- data/spec/unit/rexml_isolated_test.rb +271 -0
- metadata +99 -3
- data/.ruby-version +0 -1
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
# Entity Support for lutaml-model Team
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
Moxml now supports entity restoration during parsing. This feature ensures that XML entities (like `&`, `<`, `>`, `"`, `'`) are preserved as `EntityReference` nodes rather than being resolved to their character values during parsing.
|
|
6
|
+
|
|
7
|
+
## Key Concept: Entity Restoration
|
|
8
|
+
|
|
9
|
+
By default, XML parsers resolve entities during parsing:
|
|
10
|
+
- Input: `<root>foo&bar</root>`
|
|
11
|
+
- Default behavior: Text node contains `foo&bar` (resolved `&`)
|
|
12
|
+
- With entity restoration: Text node contains `foo` + EntityReference `&` + Text node `bar`
|
|
13
|
+
|
|
14
|
+
## Enabling Entity Restoration
|
|
15
|
+
|
|
16
|
+
### Option 1: Per-Context Configuration
|
|
17
|
+
|
|
18
|
+
```ruby
|
|
19
|
+
context = Moxml.new(:nokogiri, restore_entities: true)
|
|
20
|
+
doc = context.parse('<root>foo&bar</root>')
|
|
21
|
+
# doc.to_xml will preserve & as EntityReference
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
### Option 2: Global Configuration
|
|
25
|
+
|
|
26
|
+
```ruby
|
|
27
|
+
Moxml.configure do |config|
|
|
28
|
+
config.restore_entities = true
|
|
29
|
+
end
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
## Preloading Entity Sets
|
|
33
|
+
|
|
34
|
+
You can preload standard entity sets (HTML5, MathML, ISO) for faster entity resolution:
|
|
35
|
+
|
|
36
|
+
```ruby
|
|
37
|
+
context = Moxml.new(:nokogiri,
|
|
38
|
+
restore_entities: true,
|
|
39
|
+
preload_entity_sets: [:html5, :mathml]
|
|
40
|
+
)
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## W3C XML Core WG Compliance
|
|
44
|
+
|
|
45
|
+
Per W3C XML Core WG guidance:
|
|
46
|
+
- Standard XML entities (`amp`, `lt`, `gt`, `quot`, `apos`) are implicitly declared per XML spec
|
|
47
|
+
- The `EntityRegistry` class tracks all known entities and their Unicode codepoints
|
|
48
|
+
- Entity names are preserved through round-trip serialization
|
|
49
|
+
|
|
50
|
+
## What lutaml-model Needs to Know
|
|
51
|
+
|
|
52
|
+
### 1. Document Structure with Entities
|
|
53
|
+
|
|
54
|
+
When entity restoration is enabled, documents containing entities will have mixed node types:
|
|
55
|
+
|
|
56
|
+
```
|
|
57
|
+
Document
|
|
58
|
+
└── Element: root
|
|
59
|
+
├── Text: "foo"
|
|
60
|
+
├── EntityReference: "amp" # Represents &
|
|
61
|
+
└── Text: "bar"
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### 2. Serialization
|
|
65
|
+
|
|
66
|
+
`doc.to_xml` will serialize EntityReference nodes as proper XML entity syntax:
|
|
67
|
+
- `EntityReference("amp")` → `&`
|
|
68
|
+
- `EntityReference("lt")` → `<`
|
|
69
|
+
- etc.
|
|
70
|
+
|
|
71
|
+
### 3. XPath Queries
|
|
72
|
+
|
|
73
|
+
EntityReference nodes participate in XPath queries like any other node. You can query for them specifically if needed.
|
|
74
|
+
|
|
75
|
+
### 4. Configuration Inheritance
|
|
76
|
+
|
|
77
|
+
When using `Moxml::Context`, the entity restoration setting is preserved through document operations. However, when creating new contexts, you need to set the option explicitly.
|
|
78
|
+
|
|
79
|
+
## Example Usage in lutaml-model
|
|
80
|
+
|
|
81
|
+
```ruby
|
|
82
|
+
# Parse XML with entities preserved
|
|
83
|
+
context = Moxml.new(:nokogiri, restore_entities: true)
|
|
84
|
+
doc = context.parse(your_xml_string)
|
|
85
|
+
|
|
86
|
+
# Serialize back - entities are preserved
|
|
87
|
+
output = doc.to_xml
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## Testing Considerations
|
|
91
|
+
|
|
92
|
+
When writing tests for models that handle XML with entities:
|
|
93
|
+
1. Enable `restore_entities: true` in your test context
|
|
94
|
+
2. Verify that EntityReference nodes are created for entities in text
|
|
95
|
+
3. Test round-trip: parse → serialize → parse should preserve entities
|
|
96
|
+
|
|
97
|
+
## Files of Interest
|
|
98
|
+
|
|
99
|
+
- `lib/moxml/entity_registry.rb` - Entity definitions and lookup
|
|
100
|
+
- `lib/moxml/config.rb` - Configuration options
|
|
101
|
+
- `lib/moxml/document_builder.rb` - Entity restoration logic
|
|
102
|
+
- `lib/moxml/entity_reference.rb` - EntityReference node class
|
data/docs/_guides/index.adoc
CHANGED
|
@@ -10,23 +10,25 @@ Task-oriented guides for common XML processing operations with Moxml.
|
|
|
10
10
|
=== Document operations
|
|
11
11
|
|
|
12
12
|
link:parsing-xml[Parsing XML]::
|
|
13
|
-
Parse XML from strings, files, and IO streams
|
|
13
|
+
Parse XML from strings, files, and IO streams
|
|
14
14
|
|
|
15
|
-
link:
|
|
16
|
-
|
|
15
|
+
link:working-with-documents[Working with Documents]::
|
|
16
|
+
Create and manipulate XML documents
|
|
17
17
|
|
|
18
|
-
link:xml
|
|
19
|
-
|
|
18
|
+
link:modifying-xml[Modifying XML]::
|
|
19
|
+
Edit elements, attributes, and content
|
|
20
20
|
|
|
21
|
-
link:
|
|
22
|
-
|
|
23
|
-
manipulation.
|
|
21
|
+
link:sax-parsing[SAX Parsing]::
|
|
22
|
+
Memory-efficient event-driven parsing
|
|
24
23
|
|
|
25
|
-
link:
|
|
26
|
-
|
|
24
|
+
link:advanced-features[Advanced Features]::
|
|
25
|
+
Namespaces, XPath, and more
|
|
26
|
+
|
|
27
|
+
link:node-api-consistency[Node API Consistency]::
|
|
28
|
+
Understanding node types and their APIs
|
|
27
29
|
|
|
28
|
-
link:
|
|
29
|
-
|
|
30
|
+
link:development-testing[Development & Testing]::
|
|
31
|
+
Contributing to Moxml
|
|
30
32
|
|
|
31
33
|
=== Querying and traversal
|
|
32
34
|
|
|
@@ -0,0 +1,572 @@
|
|
|
1
|
+
= Node API consistency guide
|
|
2
|
+
:toc:
|
|
3
|
+
:toclevels: 3
|
|
4
|
+
|
|
5
|
+
== Overview
|
|
6
|
+
|
|
7
|
+
This guide documents the API surface of all node types in Moxml, providing clear
|
|
8
|
+
expectations for developers about which methods are available on which node
|
|
9
|
+
types.
|
|
10
|
+
|
|
11
|
+
== Node Type Hierarchy
|
|
12
|
+
|
|
13
|
+
[source]
|
|
14
|
+
----
|
|
15
|
+
Node (abstract base)
|
|
16
|
+
├── Document
|
|
17
|
+
├── Element
|
|
18
|
+
├── Attribute
|
|
19
|
+
├── Text
|
|
20
|
+
├── Cdata
|
|
21
|
+
├── Comment
|
|
22
|
+
├── ProcessingInstruction
|
|
23
|
+
├── Declaration
|
|
24
|
+
└── Doctype
|
|
25
|
+
----
|
|
26
|
+
|
|
27
|
+
== Common Node Methods
|
|
28
|
+
|
|
29
|
+
All node types inherit these methods from `Moxml::Node`:
|
|
30
|
+
|
|
31
|
+
=== Document Navigation
|
|
32
|
+
|
|
33
|
+
[cols="1,3,1"]
|
|
34
|
+
|===
|
|
35
|
+
| Method | Description | Always Available?
|
|
36
|
+
|
|
37
|
+
| `#document`
|
|
38
|
+
| Returns the containing document
|
|
39
|
+
| ✅ Yes
|
|
40
|
+
|
|
41
|
+
| `#parent`
|
|
42
|
+
| Returns the parent node
|
|
43
|
+
| ✅ Yes
|
|
44
|
+
|
|
45
|
+
| `#children`
|
|
46
|
+
| Returns a NodeSet of child nodes
|
|
47
|
+
| ✅ Yes (may be empty)
|
|
48
|
+
|
|
49
|
+
| `#next_sibling`
|
|
50
|
+
| Returns the next sibling node
|
|
51
|
+
| ✅ Yes (may be nil)
|
|
52
|
+
|
|
53
|
+
| `#previous_sibling`
|
|
54
|
+
| Returns the previous sibling node
|
|
55
|
+
| ✅ Yes (may be nil)
|
|
56
|
+
|===
|
|
57
|
+
|
|
58
|
+
=== Tree Manipulation
|
|
59
|
+
|
|
60
|
+
[cols="1,3,1"]
|
|
61
|
+
|===
|
|
62
|
+
| Method | Description | Always Available?
|
|
63
|
+
|
|
64
|
+
| `#add_child(node)`
|
|
65
|
+
| Adds a child node
|
|
66
|
+
| ✅ Yes
|
|
67
|
+
|
|
68
|
+
| `#add_previous_sibling(node)`
|
|
69
|
+
| Adds a sibling before this node
|
|
70
|
+
| ✅ Yes
|
|
71
|
+
|
|
72
|
+
| `#add_next_sibling(node)`
|
|
73
|
+
| Adds a sibling after this node
|
|
74
|
+
| ✅ Yes
|
|
75
|
+
|
|
76
|
+
| `#remove`
|
|
77
|
+
| Removes this node from the tree
|
|
78
|
+
| ✅ Yes
|
|
79
|
+
|
|
80
|
+
| `#replace(node)`
|
|
81
|
+
| Replaces this node with another
|
|
82
|
+
| ✅ Yes
|
|
83
|
+
|===
|
|
84
|
+
|
|
85
|
+
=== Serialization
|
|
86
|
+
|
|
87
|
+
[cols="1,3,1"]
|
|
88
|
+
|===
|
|
89
|
+
| Method | Description | Always Available?
|
|
90
|
+
|
|
91
|
+
| `#to_xml(options = {})`
|
|
92
|
+
| Serializes node to XML string
|
|
93
|
+
| ✅ Yes
|
|
94
|
+
|
|
95
|
+
| `#clone` / `#dup`
|
|
96
|
+
| Creates a deep copy of the node
|
|
97
|
+
| ✅ Yes
|
|
98
|
+
|===
|
|
99
|
+
|
|
100
|
+
=== XPath Queries
|
|
101
|
+
|
|
102
|
+
[cols="1,3,1"]
|
|
103
|
+
|===
|
|
104
|
+
| Method | Description | Always Available?
|
|
105
|
+
|
|
106
|
+
| `#xpath(expression, ns = {})`
|
|
107
|
+
| Returns NodeSet matching XPath
|
|
108
|
+
| ✅ Yes (adapter-dependent)
|
|
109
|
+
|
|
110
|
+
| `#at_xpath(expression, ns = {})`
|
|
111
|
+
| Returns first node matching XPath
|
|
112
|
+
| ✅ Yes (adapter-dependent)
|
|
113
|
+
|===
|
|
114
|
+
|
|
115
|
+
=== Type Checking
|
|
116
|
+
|
|
117
|
+
[cols="1,3,1"]
|
|
118
|
+
|===
|
|
119
|
+
| Method | Description | Always Available?
|
|
120
|
+
|
|
121
|
+
| `#element?`
|
|
122
|
+
| Returns true if node is an Element
|
|
123
|
+
| ✅ Yes
|
|
124
|
+
|
|
125
|
+
| `#text?`
|
|
126
|
+
| Returns true if node is a Text node
|
|
127
|
+
| ✅ Yes
|
|
128
|
+
|
|
129
|
+
| `#cdata?`
|
|
130
|
+
| Returns true if node is a CDATA section
|
|
131
|
+
| ✅ Yes
|
|
132
|
+
|
|
133
|
+
| `#comment?`
|
|
134
|
+
| Returns true if node is a Comment
|
|
135
|
+
| ✅ Yes
|
|
136
|
+
|
|
137
|
+
| `#processing_instruction?`
|
|
138
|
+
| Returns true if node is a PI
|
|
139
|
+
| ✅ Yes
|
|
140
|
+
|
|
141
|
+
| `#document?`
|
|
142
|
+
| Returns true if node is a Document
|
|
143
|
+
| ✅ Yes
|
|
144
|
+
|
|
145
|
+
| `#declaration?`
|
|
146
|
+
| Returns true if node is a Declaration
|
|
147
|
+
| ✅ Yes
|
|
148
|
+
|
|
149
|
+
| `#doctype?`
|
|
150
|
+
| Returns true if node is a Doctype
|
|
151
|
+
| ✅ Yes
|
|
152
|
+
|
|
153
|
+
| `#attribute?`
|
|
154
|
+
| Returns true if node is an Attribute
|
|
155
|
+
| ✅ Yes
|
|
156
|
+
|===
|
|
157
|
+
|
|
158
|
+
== Node Type-Specific APIs
|
|
159
|
+
|
|
160
|
+
=== Document
|
|
161
|
+
|
|
162
|
+
The root document node.
|
|
163
|
+
|
|
164
|
+
==== Additional Methods
|
|
165
|
+
|
|
166
|
+
[cols="1,3,1"]
|
|
167
|
+
|===
|
|
168
|
+
| Method | Description | Always Available?
|
|
169
|
+
|
|
170
|
+
| `#root`
|
|
171
|
+
| Returns the root element
|
|
172
|
+
| ✅ Yes
|
|
173
|
+
|
|
174
|
+
| `#root=(element)`
|
|
175
|
+
| Sets the root element
|
|
176
|
+
| ✅ Yes
|
|
177
|
+
|
|
178
|
+
| `#encoding`
|
|
179
|
+
| Returns document encoding
|
|
180
|
+
| ✅ Yes
|
|
181
|
+
|
|
182
|
+
| `#create_element(name, content = nil)`
|
|
183
|
+
| Creates a new element
|
|
184
|
+
| ✅ Yes
|
|
185
|
+
|
|
186
|
+
| `#create_text(content)`
|
|
187
|
+
| Creates a new text node
|
|
188
|
+
| ✅ Yes
|
|
189
|
+
|
|
190
|
+
| `#create_comment(content)`
|
|
191
|
+
| Creates a new comment
|
|
192
|
+
| ✅ Yes
|
|
193
|
+
|
|
194
|
+
| `#create_cdata(content)`
|
|
195
|
+
| Creates a new CDATA section
|
|
196
|
+
| ✅ Yes
|
|
197
|
+
|
|
198
|
+
| `#create_processing_instruction(target, content)`
|
|
199
|
+
| Creates a new processing instruction
|
|
200
|
+
| ✅ Yes
|
|
201
|
+
|
|
202
|
+
| `#create_declaration(version, encoding, standalone)`
|
|
203
|
+
| Creates a new XML declaration
|
|
204
|
+
| ✅ Yes (adapter-dependent)
|
|
205
|
+
|===
|
|
206
|
+
|
|
207
|
+
=== Element
|
|
208
|
+
|
|
209
|
+
Elements are the primary structural nodes with tag names, attributes, and children.
|
|
210
|
+
|
|
211
|
+
==== Identity Methods
|
|
212
|
+
|
|
213
|
+
[cols="1,3,1"]
|
|
214
|
+
|===
|
|
215
|
+
| Method | Description | Always Available?
|
|
216
|
+
|
|
217
|
+
| `#name`
|
|
218
|
+
| Returns the element tag name
|
|
219
|
+
| ✅ Yes
|
|
220
|
+
|
|
221
|
+
| `#name=(value)`
|
|
222
|
+
| Sets the element tag name
|
|
223
|
+
| ✅ Yes
|
|
224
|
+
|
|
225
|
+
| `#identifier`
|
|
226
|
+
| Returns the primary identifier (same as #name)
|
|
227
|
+
| ✅ Yes
|
|
228
|
+
|===
|
|
229
|
+
|
|
230
|
+
==== Namespace Methods
|
|
231
|
+
|
|
232
|
+
[cols="1,3,1"]
|
|
233
|
+
|===
|
|
234
|
+
| Method | Description | Always Available?
|
|
235
|
+
|
|
236
|
+
| `#namespace`
|
|
237
|
+
| Returns the element's namespace
|
|
238
|
+
| ✅ Yes (may be nil)
|
|
239
|
+
|
|
240
|
+
| `#namespace=(ns_or_hash)`
|
|
241
|
+
| Sets the element's namespace
|
|
242
|
+
| ✅ Yes
|
|
243
|
+
|
|
244
|
+
| `#namespace_prefix`
|
|
245
|
+
| Returns the namespace prefix
|
|
246
|
+
| ✅ Yes (may be nil)
|
|
247
|
+
|
|
248
|
+
| `#namespace_uri`
|
|
249
|
+
| Returns the namespace URI
|
|
250
|
+
| ✅ Yes (may be nil)
|
|
251
|
+
|
|
252
|
+
| `#namespaces`
|
|
253
|
+
| Returns all namespace definitions
|
|
254
|
+
| ✅ Yes
|
|
255
|
+
|
|
256
|
+
| `#add_namespace(prefix, uri)`
|
|
257
|
+
| Adds a namespace definition
|
|
258
|
+
| ✅ Yes
|
|
259
|
+
|===
|
|
260
|
+
|
|
261
|
+
==== Attribute Methods
|
|
262
|
+
|
|
263
|
+
[cols="1,3,1"]
|
|
264
|
+
|===
|
|
265
|
+
| Method | Description | Always Available?
|
|
266
|
+
|
|
267
|
+
| `#[](name)`
|
|
268
|
+
| Gets attribute value
|
|
269
|
+
| ✅ Yes
|
|
270
|
+
|
|
271
|
+
| `#[]=(name, value)`
|
|
272
|
+
| Sets attribute value
|
|
273
|
+
| ✅ Yes
|
|
274
|
+
|
|
275
|
+
| `#attribute(name)`
|
|
276
|
+
| Returns Attribute object
|
|
277
|
+
| ✅ Yes (may be nil)
|
|
278
|
+
|
|
279
|
+
| `#attributes`
|
|
280
|
+
| Returns array of all attributes
|
|
281
|
+
| ✅ Yes
|
|
282
|
+
|
|
283
|
+
| `#remove_attribute(name)`
|
|
284
|
+
| Removes an attribute
|
|
285
|
+
| ✅ Yes
|
|
286
|
+
|===
|
|
287
|
+
|
|
288
|
+
==== Content Methods
|
|
289
|
+
|
|
290
|
+
[cols="1,3,1"]
|
|
291
|
+
|===
|
|
292
|
+
| Method | Description | Always Available?
|
|
293
|
+
|
|
294
|
+
| `#text`
|
|
295
|
+
| Returns text content
|
|
296
|
+
| ✅ Yes
|
|
297
|
+
|
|
298
|
+
| `#text=(content)`
|
|
299
|
+
| Sets text content
|
|
300
|
+
| ✅ Yes
|
|
301
|
+
|
|
302
|
+
| `#inner_text`
|
|
303
|
+
| Returns inner text (concatenated)
|
|
304
|
+
| ✅ Yes
|
|
305
|
+
|
|
306
|
+
| `#inner_xml`
|
|
307
|
+
| Returns inner XML as string
|
|
308
|
+
| ✅ Yes
|
|
309
|
+
|
|
310
|
+
| `#inner_xml=(xml)`
|
|
311
|
+
| Sets inner XML from string
|
|
312
|
+
| ✅ Yes
|
|
313
|
+
|===
|
|
314
|
+
|
|
315
|
+
=== Attribute
|
|
316
|
+
|
|
317
|
+
Attributes are name-value pairs attached to elements.
|
|
318
|
+
|
|
319
|
+
==== Identity Methods
|
|
320
|
+
|
|
321
|
+
[cols="1,3,1"]
|
|
322
|
+
|===
|
|
323
|
+
| Method | Description | Always Available?
|
|
324
|
+
|
|
325
|
+
| `#name`
|
|
326
|
+
| Returns the attribute name
|
|
327
|
+
| ✅ Yes
|
|
328
|
+
|
|
329
|
+
| `#name=(new_name)`
|
|
330
|
+
| Sets the attribute name
|
|
331
|
+
| ✅ Yes
|
|
332
|
+
|
|
333
|
+
| `#identifier`
|
|
334
|
+
| Returns the primary identifier (same as #name)
|
|
335
|
+
| ✅ Yes
|
|
336
|
+
|===
|
|
337
|
+
|
|
338
|
+
==== Value Methods
|
|
339
|
+
|
|
340
|
+
[cols="1,3,1"]
|
|
341
|
+
|===
|
|
342
|
+
| Method | Description | Always Available?
|
|
343
|
+
|
|
344
|
+
| `#value`
|
|
345
|
+
| Returns the attribute value
|
|
346
|
+
| ✅ Yes
|
|
347
|
+
|
|
348
|
+
| `#value=(new_value)`
|
|
349
|
+
| Sets the attribute value
|
|
350
|
+
| ✅ Yes
|
|
351
|
+
|
|
352
|
+
| `#text`
|
|
353
|
+
| Alias for #value (XPath compatibility)
|
|
354
|
+
| ✅ Yes
|
|
355
|
+
|===
|
|
356
|
+
|
|
357
|
+
==== Relationship Methods
|
|
358
|
+
|
|
359
|
+
[cols="1,3,1"]
|
|
360
|
+
|===
|
|
361
|
+
| Method | Description | Always Available?
|
|
362
|
+
|
|
363
|
+
| `#element`
|
|
364
|
+
| Returns the owning element
|
|
365
|
+
| ✅ Yes
|
|
366
|
+
|
|
367
|
+
| `#namespace`
|
|
368
|
+
| Returns the attribute's namespace
|
|
369
|
+
| ✅ Yes (may be nil)
|
|
370
|
+
|===
|
|
371
|
+
|
|
372
|
+
=== Text
|
|
373
|
+
|
|
374
|
+
Text nodes contain character data.
|
|
375
|
+
|
|
376
|
+
==== Content Methods
|
|
377
|
+
|
|
378
|
+
[cols="1,3,1"]
|
|
379
|
+
|===
|
|
380
|
+
| Method | Description | Always Available?
|
|
381
|
+
|
|
382
|
+
| `#content`
|
|
383
|
+
| Returns the text content
|
|
384
|
+
| ✅ Yes
|
|
385
|
+
|
|
386
|
+
| `#content=(text)`
|
|
387
|
+
| Sets the text content
|
|
388
|
+
| ✅ Yes
|
|
389
|
+
|
|
390
|
+
| `#text`
|
|
391
|
+
| Alias for #content
|
|
392
|
+
| ✅ Yes
|
|
393
|
+
|===
|
|
394
|
+
|
|
395
|
+
==== Identity Methods
|
|
396
|
+
|
|
397
|
+
[cols="1,3,1"]
|
|
398
|
+
|===
|
|
399
|
+
| Method | Description | Always Available?
|
|
400
|
+
|
|
401
|
+
| `#identifier`
|
|
402
|
+
| Returns nil (content nodes have no identifier)
|
|
403
|
+
| ✅ Yes
|
|
404
|
+
|===
|
|
405
|
+
|
|
406
|
+
=== Cdata
|
|
407
|
+
|
|
408
|
+
CDATA sections contain character data that should not be parsed.
|
|
409
|
+
|
|
410
|
+
==== Content Methods
|
|
411
|
+
|
|
412
|
+
[cols="1,3,1"]
|
|
413
|
+
|===
|
|
414
|
+
| Method | Description | Always Available?
|
|
415
|
+
|
|
416
|
+
| `#content`
|
|
417
|
+
| Returns the CDATA content
|
|
418
|
+
| ✅ Yes
|
|
419
|
+
|
|
420
|
+
| `#content=(text)`
|
|
421
|
+
| Sets the CDATA content
|
|
422
|
+
| ✅ Yes
|
|
423
|
+
|
|
424
|
+
| `#text`
|
|
425
|
+
| Alias for #content
|
|
426
|
+
| ✅ Yes
|
|
427
|
+
|===
|
|
428
|
+
|
|
429
|
+
==== Identity Methods
|
|
430
|
+
|
|
431
|
+
[cols="1,3,1"]
|
|
432
|
+
|===
|
|
433
|
+
| Method | Description | Always Available?
|
|
434
|
+
|
|
435
|
+
| `#identifier`
|
|
436
|
+
| Returns nil (content nodes have no identifier)
|
|
437
|
+
| ✅ Yes
|
|
438
|
+
|===
|
|
439
|
+
|
|
440
|
+
=== Comment
|
|
441
|
+
|
|
442
|
+
Comment nodes contain XML comments.
|
|
443
|
+
|
|
444
|
+
==== Content Methods
|
|
445
|
+
|
|
446
|
+
[cols="1,3,1"]
|
|
447
|
+
|===
|
|
448
|
+
| Method | Description | Always Available?
|
|
449
|
+
|
|
450
|
+
| `#content`
|
|
451
|
+
| Returns the comment text
|
|
452
|
+
| ✅ Yes
|
|
453
|
+
|
|
454
|
+
| `#content=(text)`
|
|
455
|
+
| Sets the comment text
|
|
456
|
+
| ✅ Yes
|
|
457
|
+
|
|
458
|
+
| `#text`
|
|
459
|
+
| Alias for #content
|
|
460
|
+
| ✅ Yes
|
|
461
|
+
|===
|
|
462
|
+
|
|
463
|
+
==== Identity Methods
|
|
464
|
+
|
|
465
|
+
[cols="1,3,1"]
|
|
466
|
+
|===
|
|
467
|
+
| Method | Description | Always Available?
|
|
468
|
+
|
|
469
|
+
| `#identifier`
|
|
470
|
+
| Returns nil (content nodes have no identifier)
|
|
471
|
+
| ✅ Yes
|
|
472
|
+
|===
|
|
473
|
+
|
|
474
|
+
=== ProcessingInstruction
|
|
475
|
+
|
|
476
|
+
Processing instructions provide directives to applications.
|
|
477
|
+
|
|
478
|
+
==== Identity Methods
|
|
479
|
+
|
|
480
|
+
[cols="1,3,1"]
|
|
481
|
+
|===
|
|
482
|
+
| Method | Description | Always Available?
|
|
483
|
+
|
|
484
|
+
| `#target`
|
|
485
|
+
| Returns the PI target
|
|
486
|
+
| ✅ Yes
|
|
487
|
+
|
|
488
|
+
| `#target=(new_target)`
|
|
489
|
+
| Sets the PI target
|
|
490
|
+
| ✅ Yes
|
|
491
|
+
|
|
492
|
+
| `#identifier`
|
|
493
|
+
| Returns the primary identifier (same as #target)
|
|
494
|
+
| ✅ Yes
|
|
495
|
+
|===
|
|
496
|
+
|
|
497
|
+
==== Content Methods
|
|
498
|
+
|
|
499
|
+
[cols="1,3,1"]
|
|
500
|
+
|===
|
|
501
|
+
| Method | Description | Always Available?
|
|
502
|
+
|
|
503
|
+
| `#content`
|
|
504
|
+
| Returns the PI content
|
|
505
|
+
| ✅ Yes
|
|
506
|
+
|
|
507
|
+
| `#content=(new_content)`
|
|
508
|
+
| Sets the PI content
|
|
509
|
+
| ✅ Yes
|
|
510
|
+
|===
|
|
511
|
+
|
|
512
|
+
=== Declaration
|
|
513
|
+
|
|
514
|
+
XML declarations specify document metadata.
|
|
515
|
+
|
|
516
|
+
==== Property Methods
|
|
517
|
+
|
|
518
|
+
[cols="1,3,1"]
|
|
519
|
+
|===
|
|
520
|
+
| Method | Description | Always Available?
|
|
521
|
+
|
|
522
|
+
| `#version`
|
|
523
|
+
| Returns XML version (e.g., "1.0")
|
|
524
|
+
| ✅ Yes
|
|
525
|
+
|
|
526
|
+
| `#version=(new_version)`
|
|
527
|
+
| Sets XML version
|
|
528
|
+
| ✅ Yes
|
|
529
|
+
|
|
530
|
+
| `#encoding`
|
|
531
|
+
| Returns character encoding
|
|
532
|
+
| ✅ Yes (may be nil)
|
|
533
|
+
|
|
534
|
+
| `#encoding=(new_encoding)`
|
|
535
|
+
| Sets character encoding
|
|
536
|
+
| ✅ Yes
|
|
537
|
+
|
|
538
|
+
| `#standalone`
|
|
539
|
+
| Returns standalone status
|
|
540
|
+
| ✅ Yes (may be nil)
|
|
541
|
+
|
|
542
|
+
| `#standalone=(new_standalone)`
|
|
543
|
+
| Sets standalone status
|
|
544
|
+
| ✅ Yes
|
|
545
|
+
|===
|
|
546
|
+
|
|
547
|
+
==== Identity Methods
|
|
548
|
+
|
|
549
|
+
[cols="1,3,1"]
|
|
550
|
+
|===
|
|
551
|
+
| Method | Description | Always Available?
|
|
552
|
+
|
|
553
|
+
| `#identifier`
|
|
554
|
+
| Returns nil (declarations have no identifier)
|
|
555
|
+
| ✅ Yes
|
|
556
|
+
|===
|
|
557
|
+
|
|
558
|
+
=== Doctype
|
|
559
|
+
|
|
560
|
+
Document type declarations specify DTD information.
|
|
561
|
+
|
|
562
|
+
WARNING: Doctype accessor methods are **not fully implemented** across all adapters. The availability of these methods depends on the specific adapter being used.
|
|
563
|
+
|
|
564
|
+
==== Identity Methods
|
|
565
|
+
|
|
566
|
+
[cols="1,3,1"]
|
|
567
|
+
|===
|
|
568
|
+
|Doctype |`#name` |✅ Yes |Returns DOCTYPE name (root element)
|
|
569
|
+
|Doctype |`#external_id` |✅ Yes |Returns PUBLIC identifier
|
|
570
|
+
|Doctype |`#system_id` |✅ Yes |Returns SYSTEM identifier (DTD URI)
|
|
571
|
+
|Doctype |`#identifier` |✅ Yes |Returns DOCTYPE name (same as `#name`)
|
|
572
|
+
|===
|
|
@@ -439,12 +439,12 @@ doc.to_xml # Use automatic preservation
|
|
|
439
439
|
Controls whether XML declaration is included in serialized output:
|
|
440
440
|
|
|
441
441
|
* `true`: Always include declaration
|
|
442
|
-
* `false`: Never include declaration
|
|
442
|
+
* `false`: Never include declaration
|
|
443
443
|
* Not specified: Use `has_xml_declaration` value (automatic preservation)
|
|
444
444
|
|
|
445
445
|
== See Also
|
|
446
446
|
|
|
447
|
-
* link:parsing-xml
|
|
448
|
-
* link:modifying-xml
|
|
449
|
-
* link:../adapters/index
|
|
450
|
-
* link:../best-practices
|
|
447
|
+
* link:parsing-xml[Parsing XML] - How to parse XML documents
|
|
448
|
+
* link:modifying-xml[Modifying XML] - Working with parsed documents
|
|
449
|
+
* link:../adapters/index[Adapters] - Adapter-specific behavior
|
|
450
|
+
* link:../best-practices[Best Practices] - General XML processing guidelines
|