moxml 0.1.7 → 0.1.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/workflows/dependent-repos.json +5 -0
- data/.github/workflows/dependent-tests.yml +20 -0
- data/.github/workflows/docs.yml +59 -0
- data/.github/workflows/rake.yml +10 -10
- data/.github/workflows/release.yml +5 -3
- data/.gitignore +37 -0
- data/.rubocop.yml +15 -7
- data/.rubocop_todo.yml +238 -40
- data/Gemfile +14 -9
- data/LICENSE.md +6 -2
- data/README.adoc +535 -373
- data/Rakefile +53 -0
- data/benchmarks/.gitignore +6 -0
- data/benchmarks/generate_report.rb +550 -0
- data/docs/Gemfile +13 -0
- data/docs/_config.yml +138 -0
- data/docs/_guides/advanced-features.adoc +87 -0
- data/docs/_guides/development-testing.adoc +165 -0
- data/docs/_guides/index.adoc +45 -0
- data/docs/_guides/modifying-xml.adoc +293 -0
- data/docs/_guides/parsing-xml.adoc +231 -0
- data/docs/_guides/sax-parsing.adoc +603 -0
- data/docs/_guides/working-with-documents.adoc +118 -0
- data/docs/_pages/adapter-compatibility.adoc +369 -0
- data/docs/_pages/adapters/headed-ox.adoc +237 -0
- data/docs/_pages/adapters/index.adoc +98 -0
- data/docs/_pages/adapters/libxml.adoc +286 -0
- data/docs/_pages/adapters/nokogiri.adoc +252 -0
- data/docs/_pages/adapters/oga.adoc +292 -0
- data/docs/_pages/adapters/ox.adoc +55 -0
- data/docs/_pages/adapters/rexml.adoc +293 -0
- data/docs/_pages/best-practices.adoc +430 -0
- data/docs/_pages/compatibility.adoc +468 -0
- data/docs/_pages/configuration.adoc +251 -0
- data/docs/_pages/error-handling.adoc +350 -0
- data/docs/_pages/headed-ox-limitations.adoc +558 -0
- data/docs/_pages/headed-ox.adoc +1025 -0
- data/docs/_pages/index.adoc +35 -0
- data/docs/_pages/installation.adoc +141 -0
- data/docs/_pages/node-api-reference.adoc +50 -0
- data/docs/_pages/performance.adoc +36 -0
- data/docs/_pages/quick-start.adoc +244 -0
- data/docs/_pages/thread-safety.adoc +29 -0
- data/docs/_references/document-api.adoc +408 -0
- data/docs/_references/index.adoc +48 -0
- data/docs/_tutorials/basic-usage.adoc +268 -0
- data/docs/_tutorials/builder-pattern.adoc +343 -0
- data/docs/_tutorials/index.adoc +33 -0
- data/docs/_tutorials/namespace-handling.adoc +325 -0
- data/docs/_tutorials/xpath-queries.adoc +359 -0
- data/docs/index.adoc +122 -0
- data/examples/README.md +124 -0
- data/examples/api_client/README.md +424 -0
- data/examples/api_client/api_client.rb +394 -0
- data/examples/api_client/example_response.xml +48 -0
- data/examples/headed_ox_example/README.md +90 -0
- data/examples/headed_ox_example/headed_ox_demo.rb +71 -0
- data/examples/rss_parser/README.md +194 -0
- data/examples/rss_parser/example_feed.xml +93 -0
- data/examples/rss_parser/rss_parser.rb +189 -0
- data/examples/sax_parsing/README.md +50 -0
- data/examples/sax_parsing/data_extractor.rb +75 -0
- data/examples/sax_parsing/example.xml +21 -0
- data/examples/sax_parsing/large_file.rb +78 -0
- data/examples/sax_parsing/simple_parser.rb +55 -0
- data/examples/web_scraper/README.md +352 -0
- data/examples/web_scraper/example_page.html +201 -0
- data/examples/web_scraper/web_scraper.rb +312 -0
- data/lib/moxml/adapter/base.rb +107 -28
- data/lib/moxml/adapter/customized_libxml/cdata.rb +28 -0
- data/lib/moxml/adapter/customized_libxml/comment.rb +24 -0
- data/lib/moxml/adapter/customized_libxml/declaration.rb +85 -0
- data/lib/moxml/adapter/customized_libxml/element.rb +39 -0
- data/lib/moxml/adapter/customized_libxml/node.rb +44 -0
- data/lib/moxml/adapter/customized_libxml/processing_instruction.rb +31 -0
- data/lib/moxml/adapter/customized_libxml/text.rb +27 -0
- data/lib/moxml/adapter/customized_oga/xml_generator.rb +1 -1
- data/lib/moxml/adapter/customized_ox/attribute.rb +28 -1
- data/lib/moxml/adapter/customized_rexml/formatter.rb +11 -6
- data/lib/moxml/adapter/headed_ox.rb +161 -0
- data/lib/moxml/adapter/libxml.rb +1548 -0
- data/lib/moxml/adapter/nokogiri.rb +121 -9
- data/lib/moxml/adapter/oga.rb +123 -12
- data/lib/moxml/adapter/ox.rb +282 -26
- data/lib/moxml/adapter/rexml.rb +127 -20
- data/lib/moxml/adapter.rb +21 -4
- data/lib/moxml/attribute.rb +6 -0
- data/lib/moxml/builder.rb +40 -4
- data/lib/moxml/config.rb +8 -3
- data/lib/moxml/context.rb +39 -1
- data/lib/moxml/doctype.rb +13 -1
- data/lib/moxml/document.rb +39 -6
- data/lib/moxml/document_builder.rb +27 -5
- data/lib/moxml/element.rb +71 -2
- data/lib/moxml/error.rb +175 -6
- data/lib/moxml/node.rb +94 -3
- data/lib/moxml/node_set.rb +34 -0
- data/lib/moxml/sax/block_handler.rb +194 -0
- data/lib/moxml/sax/element_handler.rb +124 -0
- data/lib/moxml/sax/handler.rb +113 -0
- data/lib/moxml/sax.rb +31 -0
- data/lib/moxml/version.rb +1 -1
- data/lib/moxml/xml_utils/encoder.rb +4 -4
- data/lib/moxml/xml_utils.rb +7 -4
- data/lib/moxml/xpath/ast/node.rb +159 -0
- data/lib/moxml/xpath/cache.rb +91 -0
- data/lib/moxml/xpath/compiler.rb +1768 -0
- data/lib/moxml/xpath/context.rb +26 -0
- data/lib/moxml/xpath/conversion.rb +124 -0
- data/lib/moxml/xpath/engine.rb +52 -0
- data/lib/moxml/xpath/errors.rb +101 -0
- data/lib/moxml/xpath/lexer.rb +304 -0
- data/lib/moxml/xpath/parser.rb +485 -0
- data/lib/moxml/xpath/ruby/generator.rb +269 -0
- data/lib/moxml/xpath/ruby/node.rb +193 -0
- data/lib/moxml/xpath.rb +37 -0
- data/lib/moxml.rb +5 -2
- data/moxml.gemspec +3 -1
- data/old-specs/moxml/adapter/customized_libxml/.gitkeep +6 -0
- data/spec/consistency/README.md +77 -0
- data/spec/{moxml/examples/adapter_spec.rb → consistency/adapter_parity_spec.rb} +4 -4
- data/spec/examples/README.md +75 -0
- data/spec/{support/shared_examples/examples/attribute.rb → examples/attribute_examples_spec.rb} +1 -1
- data/spec/{support/shared_examples/examples/basic_usage.rb → examples/basic_usage_spec.rb} +2 -2
- data/spec/{support/shared_examples/examples/namespace.rb → examples/namespace_examples_spec.rb} +3 -3
- data/spec/{support/shared_examples/examples/readme_examples.rb → examples/readme_examples_spec.rb} +6 -4
- data/spec/{support/shared_examples/examples/xpath.rb → examples/xpath_examples_spec.rb} +10 -6
- data/spec/integration/README.md +71 -0
- data/spec/{moxml/all_with_adapters_spec.rb → integration/all_adapters_spec.rb} +3 -2
- data/spec/integration/headed_ox_integration_spec.rb +326 -0
- data/spec/{support → integration}/shared_examples/edge_cases.rb +37 -10
- data/spec/integration/shared_examples/high_level/.gitkeep +0 -0
- data/spec/{support/shared_examples/context.rb → integration/shared_examples/high_level/context_behavior.rb} +2 -1
- data/spec/{support/shared_examples/integration.rb → integration/shared_examples/integration_workflows.rb} +23 -6
- data/spec/integration/shared_examples/node_wrappers/.gitkeep +0 -0
- data/spec/{support/shared_examples/cdata.rb → integration/shared_examples/node_wrappers/cdata_behavior.rb} +6 -1
- data/spec/{support/shared_examples/comment.rb → integration/shared_examples/node_wrappers/comment_behavior.rb} +2 -1
- data/spec/{support/shared_examples/declaration.rb → integration/shared_examples/node_wrappers/declaration_behavior.rb} +5 -2
- data/spec/{support/shared_examples/doctype.rb → integration/shared_examples/node_wrappers/doctype_behavior.rb} +2 -2
- data/spec/{support/shared_examples/document.rb → integration/shared_examples/node_wrappers/document_behavior.rb} +1 -1
- data/spec/{support/shared_examples/node.rb → integration/shared_examples/node_wrappers/node_behavior.rb} +9 -2
- data/spec/{support/shared_examples/node_set.rb → integration/shared_examples/node_wrappers/node_set_behavior.rb} +1 -18
- data/spec/{support/shared_examples/processing_instruction.rb → integration/shared_examples/node_wrappers/processing_instruction_behavior.rb} +6 -2
- data/spec/moxml/README.md +41 -0
- data/spec/moxml/adapter/.gitkeep +0 -0
- data/spec/moxml/adapter/README.md +61 -0
- data/spec/moxml/adapter/base_spec.rb +27 -0
- data/spec/moxml/adapter/headed_ox_spec.rb +311 -0
- data/spec/moxml/adapter/libxml_spec.rb +14 -0
- data/spec/moxml/adapter/ox_spec.rb +9 -8
- data/spec/moxml/adapter/shared_examples/.gitkeep +0 -0
- data/spec/{support/shared_examples/xml_adapter.rb → moxml/adapter/shared_examples/adapter_contract.rb} +39 -12
- data/spec/moxml/adapter_spec.rb +16 -0
- data/spec/moxml/attribute_spec.rb +30 -0
- data/spec/moxml/builder_spec.rb +33 -0
- data/spec/moxml/cdata_spec.rb +31 -0
- data/spec/moxml/comment_spec.rb +31 -0
- data/spec/moxml/config_spec.rb +3 -3
- data/spec/moxml/context_spec.rb +28 -0
- data/spec/moxml/declaration_spec.rb +36 -0
- data/spec/moxml/doctype_spec.rb +33 -0
- data/spec/moxml/document_builder_spec.rb +30 -0
- data/spec/moxml/document_spec.rb +105 -0
- data/spec/moxml/element_spec.rb +143 -0
- data/spec/moxml/error_spec.rb +266 -22
- data/spec/{moxml_spec.rb → moxml/moxml_spec.rb} +9 -9
- data/spec/moxml/namespace_spec.rb +32 -0
- data/spec/moxml/node_set_spec.rb +39 -0
- data/spec/moxml/node_spec.rb +37 -0
- data/spec/moxml/processing_instruction_spec.rb +34 -0
- data/spec/moxml/sax_spec.rb +1067 -0
- data/spec/moxml/text_spec.rb +31 -0
- data/spec/moxml/version_spec.rb +14 -0
- data/spec/moxml/xml_utils/.gitkeep +0 -0
- data/spec/moxml/xml_utils/encoder_spec.rb +27 -0
- data/spec/moxml/xml_utils_spec.rb +49 -0
- data/spec/moxml/xpath/ast/node_spec.rb +83 -0
- data/spec/moxml/xpath/axes_spec.rb +296 -0
- data/spec/moxml/xpath/cache_spec.rb +358 -0
- data/spec/moxml/xpath/compiler_spec.rb +406 -0
- data/spec/moxml/xpath/context_spec.rb +210 -0
- data/spec/moxml/xpath/conversion_spec.rb +365 -0
- data/spec/moxml/xpath/fixtures/sample.xml +25 -0
- data/spec/moxml/xpath/functions/boolean_functions_spec.rb +114 -0
- data/spec/moxml/xpath/functions/node_functions_spec.rb +145 -0
- data/spec/moxml/xpath/functions/numeric_functions_spec.rb +164 -0
- data/spec/moxml/xpath/functions/position_functions_spec.rb +93 -0
- data/spec/moxml/xpath/functions/special_functions_spec.rb +89 -0
- data/spec/moxml/xpath/functions/string_functions_spec.rb +381 -0
- data/spec/moxml/xpath/lexer_spec.rb +488 -0
- data/spec/moxml/xpath/parser_integration_spec.rb +210 -0
- data/spec/moxml/xpath/parser_spec.rb +364 -0
- data/spec/moxml/xpath/ruby/generator_spec.rb +421 -0
- data/spec/moxml/xpath/ruby/node_spec.rb +291 -0
- data/spec/moxml/xpath_capabilities_spec.rb +199 -0
- data/spec/moxml/xpath_spec.rb +77 -0
- data/spec/performance/README.md +83 -0
- data/spec/performance/benchmark_spec.rb +64 -0
- data/spec/{support/shared_examples/examples/memory.rb → performance/memory_usage_spec.rb} +3 -1
- data/spec/{support/shared_examples/examples/thread_safety.rb → performance/thread_safety_spec.rb} +3 -1
- data/spec/performance/xpath_benchmark_spec.rb +259 -0
- data/spec/spec_helper.rb +58 -1
- data/spec/support/xml_matchers.rb +1 -1
- metadata +176 -34
- data/spec/support/shared_examples/examples/benchmark_spec.rb +0 -51
- /data/spec/{support/shared_examples/builder.rb → integration/shared_examples/high_level/builder_behavior.rb} +0 -0
- /data/spec/{support/shared_examples/document_builder.rb → integration/shared_examples/high_level/document_builder_behavior.rb} +0 -0
- /data/spec/{support/shared_examples/attribute.rb → integration/shared_examples/node_wrappers/attribute_behavior.rb} +0 -0
- /data/spec/{support/shared_examples/element.rb → integration/shared_examples/node_wrappers/element_behavior.rb} +0 -0
- /data/spec/{support/shared_examples/namespace.rb → integration/shared_examples/node_wrappers/namespace_behavior.rb} +0 -0
- /data/spec/{support/shared_examples/text.rb → integration/shared_examples/node_wrappers/text_behavior.rb} +0 -0
data/README.adoc
CHANGED
|
@@ -41,6 +41,10 @@ https://github.com/opal/opal[Opal].
|
|
|
41
41
|
|
|
42
42
|
Ox:: https://github.com/ohler55/ox[Ox], a fast XML parser.
|
|
43
43
|
|
|
44
|
+
LibXML:: https://github.com/xml4r/libxml-ruby[libxml-ruby], Ruby bindings
|
|
45
|
+
for the performant https://github.com/GNOME/libxml2[libxml2] C library.
|
|
46
|
+
Alternative to Nokogiri with similar performance characteristics.
|
|
47
|
+
|
|
44
48
|
=== Feature table
|
|
45
49
|
|
|
46
50
|
Moxml exercises its best effort to provide a consistent interface across basic
|
|
@@ -51,41 +55,408 @@ The following table summarizes the features supported by each library.
|
|
|
51
55
|
NOTE: The checkmarks indicate support for the feature, while the footnotes
|
|
52
56
|
provide additional context for specific features.
|
|
53
57
|
|
|
54
|
-
[cols="1,1,1,1,3"]
|
|
58
|
+
[cols="1,1,1,1,1,3"]
|
|
55
59
|
|===
|
|
56
|
-
|Feature |Nokogiri |Oga |REXML |Ox
|
|
60
|
+
|Feature |Nokogiri |Oga |REXML |LibXML |Ox |HeadedOx
|
|
57
61
|
|
|
58
62
|
|Parsing, serializing
|
|
59
63
|
| ✅
|
|
60
64
|
| ✅
|
|
61
65
|
| ✅
|
|
62
66
|
| ✅
|
|
67
|
+
| ✅
|
|
68
|
+
| ✅
|
|
69
|
+
|
|
70
|
+
|SAX parsing
|
|
71
|
+
| ✅ Full (10/10 events)
|
|
72
|
+
| ✅ Full (10/10 events)
|
|
73
|
+
| ✅ Full (10/10 events)
|
|
74
|
+
| ✅ Full (10/10 events)
|
|
75
|
+
| ⚠️ Core (4/10 events) See NOTE 7.
|
|
76
|
+
| ⚠️ Core (4/10 events) See NOTE 7.
|
|
63
77
|
|
|
64
78
|
|Node manipulation
|
|
65
79
|
| ✅
|
|
66
80
|
| ✅
|
|
67
81
|
| ✅
|
|
82
|
+
| ✅
|
|
83
|
+
| ✅ See NOTE 1.
|
|
68
84
|
| ✅ See NOTE 1.
|
|
69
85
|
|
|
70
86
|
|Basic XPath
|
|
71
87
|
| ✅
|
|
72
88
|
| ✅
|
|
73
89
|
| ✅
|
|
74
|
-
|
|
75
|
-
Uses `locate`. See NOTE 2.
|
|
90
|
+
| ✅
|
|
91
|
+
| Uses Ox-specific API `locate`. See NOTE 2.
|
|
92
|
+
| ✅ Full XPath 1.0. See NOTE 3.
|
|
76
93
|
|
|
77
94
|
|XPath with namespaces
|
|
78
95
|
| ✅
|
|
79
96
|
| ✅
|
|
80
97
|
| ❌
|
|
81
|
-
|
|
82
|
-
Uses `locate`. See NOTE 2.
|
|
98
|
+
| ✅
|
|
99
|
+
| Uses Ox-specific API `locate`. See NOTE 2.
|
|
100
|
+
| ⚠️ Basic. See NOTE 3.
|
|
101
|
+
|
|
102
|
+
|===
|
|
83
103
|
|
|
104
|
+
NOTE: Ox/HeadedOx: Text node replacement may fail in some cases due to internal
|
|
105
|
+
node structure.
|
|
106
|
+
|
|
107
|
+
NOTE: Limited XPath support via `locate()` method. See adapter limitations
|
|
108
|
+
section.
|
|
109
|
+
|
|
110
|
+
NOTE: HeadedOx provides full XPath 1.0 support via a pure Ruby XPath engine
|
|
111
|
+
layered on top of Ox's C parser. See HeadedOx documentation for details.
|
|
112
|
+
|
|
113
|
+
NOTE: Ox/HeadedOx SAX: Only core events supported (start_element, end_element, characters, errors). No separate CDATA, comment, or processing instruction events.
|
|
114
|
+
|
|
115
|
+
== Adapter comparison
|
|
116
|
+
|
|
117
|
+
=== Feature compatibility matrix
|
|
118
|
+
|
|
119
|
+
[cols="3,1,1,1,1,1,1", options="header"]
|
|
120
|
+
|===
|
|
121
|
+
| Feature/Operation | Nokogiri | Oga | REXML | LibXML | Ox | HeadedOx
|
|
122
|
+
|
|
123
|
+
| *Core Operations*
|
|
124
|
+
|
|
|
125
|
+
|
|
|
126
|
+
|
|
|
127
|
+
|
|
|
128
|
+
|
|
|
129
|
+
|
|
|
130
|
+
|
|
131
|
+
| Parse XML string
|
|
132
|
+
| ✅ Full
|
|
133
|
+
| ✅ Full
|
|
134
|
+
| ✅ Full
|
|
135
|
+
| ✅ Full
|
|
136
|
+
| ✅ Full
|
|
137
|
+
| ✅ Full
|
|
138
|
+
|
|
139
|
+
| Parse XML file/IO
|
|
140
|
+
| ✅ Full
|
|
141
|
+
| ✅ Full
|
|
142
|
+
| ✅ Full
|
|
143
|
+
| ✅ Full
|
|
144
|
+
| ✅ Full
|
|
145
|
+
| ✅ Full
|
|
146
|
+
|
|
147
|
+
| Serialize to XML
|
|
148
|
+
| ✅ Full
|
|
149
|
+
| ✅ Full
|
|
150
|
+
| ✅ Full
|
|
151
|
+
| ✅ Full
|
|
152
|
+
| ✅ Full
|
|
153
|
+
| ✅ Full
|
|
154
|
+
|
|
155
|
+
| *Element Operations*
|
|
156
|
+
|
|
|
157
|
+
|
|
|
158
|
+
|
|
|
159
|
+
|
|
|
160
|
+
|
|
|
161
|
+
|
|
|
162
|
+
|
|
163
|
+
| Create elements
|
|
164
|
+
| ✅ Full
|
|
165
|
+
| ✅ Full
|
|
166
|
+
| ✅ Full
|
|
167
|
+
| ✅ Full
|
|
168
|
+
| ✅ Full
|
|
169
|
+
| ✅ Full
|
|
170
|
+
|
|
171
|
+
| Get/set attributes
|
|
172
|
+
| ✅ Full
|
|
173
|
+
| ✅ Full
|
|
174
|
+
| ✅ Full
|
|
175
|
+
| ✅ Full
|
|
176
|
+
| ✅ Full
|
|
177
|
+
| ✅ Full
|
|
178
|
+
|
|
179
|
+
| Add/remove children
|
|
180
|
+
| ✅ Full
|
|
181
|
+
| ✅ Full
|
|
182
|
+
| ✅ Full
|
|
183
|
+
| ✅ Full
|
|
184
|
+
| ✅ Full
|
|
185
|
+
| ✅ Full
|
|
186
|
+
|
|
187
|
+
| Replace nodes
|
|
188
|
+
| ✅ Full
|
|
189
|
+
| ✅ Full
|
|
190
|
+
| ✅ Full
|
|
191
|
+
| ✅ Full
|
|
192
|
+
| ⚠️ Limited^1^
|
|
193
|
+
| ⚠️ Limited^1^
|
|
194
|
+
|
|
195
|
+
| *Namespace Operations*
|
|
196
|
+
|
|
|
197
|
+
|
|
|
198
|
+
|
|
|
199
|
+
|
|
|
200
|
+
|
|
|
201
|
+
|
|
|
202
|
+
|
|
203
|
+
| Add namespaces
|
|
204
|
+
| ✅ Full
|
|
205
|
+
| ✅ Full
|
|
206
|
+
| ✅ Full
|
|
207
|
+
| ✅ Full
|
|
208
|
+
| ✅ Full
|
|
209
|
+
| ✅ Full
|
|
210
|
+
|
|
211
|
+
| Default namespaces
|
|
212
|
+
| ✅ Full
|
|
213
|
+
| ✅ Full
|
|
214
|
+
| ✅ Full
|
|
215
|
+
| ✅ Full
|
|
216
|
+
| ⚠️ Basic
|
|
217
|
+
| ⚠️ Basic
|
|
218
|
+
|
|
219
|
+
| Namespace inheritance
|
|
220
|
+
| ✅ Full
|
|
221
|
+
| ✅ Full
|
|
222
|
+
| ✅ Full
|
|
223
|
+
| ✅ Full
|
|
224
|
+
| ❌ None
|
|
225
|
+
| ❌ None^5^
|
|
226
|
+
|
|
227
|
+
| Namespaced attributes
|
|
228
|
+
| ✅ Full
|
|
229
|
+
| ✅ Full
|
|
230
|
+
| ✅ Full
|
|
231
|
+
| ✅ Full
|
|
232
|
+
| ⚠️ Limited
|
|
233
|
+
| ⚠️ Limited^5^
|
|
234
|
+
|
|
235
|
+
| *XPath Queries*
|
|
236
|
+
|
|
|
237
|
+
|
|
|
238
|
+
|
|
|
239
|
+
|
|
|
240
|
+
|
|
|
241
|
+
|
|
|
242
|
+
|
|
243
|
+
| Basic paths (`//element`)
|
|
244
|
+
| ✅ Full
|
|
245
|
+
| ✅ Full
|
|
246
|
+
| ✅ Full
|
|
247
|
+
| ✅ Full
|
|
248
|
+
| ✅ Full
|
|
249
|
+
| ✅ Full
|
|
250
|
+
|
|
251
|
+
| Attribute predicates (`[@id]`)
|
|
252
|
+
| ✅ Full
|
|
253
|
+
| ✅ Full
|
|
254
|
+
| ✅ Full
|
|
255
|
+
| ✅ Full
|
|
256
|
+
| ⚠️ Existence only^2^
|
|
257
|
+
| ✅ Full
|
|
258
|
+
|
|
259
|
+
| Attribute values (`[@id='123']`)
|
|
260
|
+
| ✅ Full
|
|
261
|
+
| ✅ Full
|
|
262
|
+
| ✅ Full
|
|
263
|
+
| ✅ Full
|
|
264
|
+
| ❌ None^3^
|
|
265
|
+
| ✅ Full
|
|
266
|
+
|
|
267
|
+
| Logical operators (`[@a and @b]`)
|
|
268
|
+
| ✅ Full
|
|
269
|
+
| ✅ Full
|
|
270
|
+
| ✅ Full
|
|
271
|
+
| ✅ Full
|
|
272
|
+
| ❌ None
|
|
273
|
+
| ✅ Full
|
|
274
|
+
|
|
275
|
+
| Position predicates (`[1]`, `[last()]`)
|
|
276
|
+
| ✅ Full
|
|
277
|
+
| ✅ Full
|
|
278
|
+
| ✅ Full
|
|
279
|
+
| ✅ Full
|
|
280
|
+
| ❌ None
|
|
281
|
+
| ✅ Full
|
|
282
|
+
|
|
283
|
+
| Text predicates (`[text()='x']`)
|
|
284
|
+
| ✅ Full
|
|
285
|
+
| ✅ Full
|
|
286
|
+
| ✅ Full
|
|
287
|
+
| ✅ Full
|
|
288
|
+
| ❌ None
|
|
289
|
+
| ✅ Full
|
|
290
|
+
|
|
291
|
+
| Namespace-aware queries
|
|
292
|
+
| ✅ Full
|
|
293
|
+
| ✅ Full
|
|
294
|
+
| ✅ Full
|
|
295
|
+
| ✅ Full
|
|
296
|
+
| ❌ None
|
|
297
|
+
| ⚠️ Basic^5^
|
|
298
|
+
|
|
299
|
+
| Parent axis (`..`)
|
|
300
|
+
| ✅ Full
|
|
301
|
+
| ✅ Full
|
|
302
|
+
| ✅ Full
|
|
303
|
+
| ✅ Full
|
|
304
|
+
| ❌ None
|
|
305
|
+
| ✅ Full
|
|
306
|
+
|
|
307
|
+
| Sibling axes
|
|
308
|
+
| ✅ Full
|
|
309
|
+
| ✅ Full
|
|
310
|
+
| ✅ Full
|
|
311
|
+
| ✅ Full
|
|
312
|
+
| ❌ None
|
|
313
|
+
| ❌ None^5^
|
|
314
|
+
|
|
315
|
+
| XPath functions (`count()`, etc.)
|
|
316
|
+
| ✅ Full
|
|
317
|
+
| ✅ Full
|
|
318
|
+
| ✅ Full
|
|
319
|
+
| ✅ Full
|
|
320
|
+
| ❌ None
|
|
321
|
+
| ✅ All 27
|
|
322
|
+
|
|
323
|
+
| *Special Content*
|
|
324
|
+
|
|
|
325
|
+
|
|
|
326
|
+
|
|
|
327
|
+
|
|
|
328
|
+
|
|
|
329
|
+
|
|
|
330
|
+
|
|
331
|
+
| CDATA sections
|
|
332
|
+
| ✅ Full
|
|
333
|
+
| ✅ Full
|
|
334
|
+
| ✅ Full
|
|
335
|
+
| ✅ Full
|
|
336
|
+
| ✅ Full
|
|
337
|
+
| ✅ Full
|
|
338
|
+
|
|
339
|
+
| Comments
|
|
340
|
+
| ✅ Full
|
|
341
|
+
| ✅ Full
|
|
342
|
+
| ✅ Full
|
|
343
|
+
| ✅ Full
|
|
344
|
+
| ✅ Full
|
|
345
|
+
| ✅ Full
|
|
346
|
+
|
|
347
|
+
| Processing instructions
|
|
348
|
+
| ✅ Full
|
|
349
|
+
| ✅ Full
|
|
350
|
+
| ✅ Full
|
|
351
|
+
| ✅ Full
|
|
352
|
+
| ✅ Full
|
|
353
|
+
| ✅ Full
|
|
354
|
+
|
|
355
|
+
| DOCTYPE declarations
|
|
356
|
+
| ✅ Full
|
|
357
|
+
| ✅ Full
|
|
358
|
+
| ✅ Full
|
|
359
|
+
| ⚠️ Limited^4^
|
|
360
|
+
| ✅ Full
|
|
361
|
+
| ✅ Full
|
|
362
|
+
|
|
363
|
+
| *Performance*
|
|
364
|
+
|
|
|
365
|
+
|
|
|
366
|
+
|
|
|
367
|
+
|
|
|
368
|
+
|
|
|
369
|
+
|
|
|
370
|
+
|
|
371
|
+
| Parse speed
|
|
372
|
+
| Fast
|
|
373
|
+
| Fast
|
|
374
|
+
| Medium
|
|
375
|
+
| Fast
|
|
376
|
+
| Very Fast
|
|
377
|
+
| Very Fast
|
|
378
|
+
|
|
379
|
+
| Serialize speed
|
|
380
|
+
| Fast
|
|
381
|
+
| Fast
|
|
382
|
+
| Medium
|
|
383
|
+
| Medium
|
|
384
|
+
| Very Fast
|
|
385
|
+
| Very Fast
|
|
386
|
+
|
|
387
|
+
| Memory usage
|
|
388
|
+
| Good
|
|
389
|
+
| Medium
|
|
390
|
+
| Medium
|
|
391
|
+
| Good
|
|
392
|
+
| Excellent
|
|
393
|
+
| Excellent
|
|
394
|
+
|
|
395
|
+
| Thread safety
|
|
396
|
+
| ✅ Yes
|
|
397
|
+
| ✅ Yes
|
|
398
|
+
| ✅ Yes
|
|
399
|
+
| ✅ Yes
|
|
400
|
+
| ✅ Yes
|
|
401
|
+
| ✅ Yes
|
|
84
402
|
|===
|
|
85
403
|
|
|
86
|
-
|
|
404
|
+
^1^ Ox/HeadedOx: Text node replacement may fail in some cases due to internal node structure +
|
|
405
|
+
^2^ Ox: `//book[@id]` works (returns all book elements), but doesn't filter by attribute existence +
|
|
406
|
+
^3^ HeadedOx: Full XPath 1.0 with all 27 functions and 6 axes. Pure Ruby XPath engine on Ox's C parser. 99.20% pass rate. See link:docs/headed-ox.adoc[] +
|
|
407
|
+
^4^ Ox: Use `.find { |el| el["id"] == "123" }` instead of XPath attribute value predicates +
|
|
408
|
+
^5^ LibXML: DOCTYPE parsing works, serialization is limited (no round-trip preservation) +
|
|
409
|
+
^6^ HeadedOx limitations: Namespace introspection and 7 axes not implemented. See link:docs/HEADED_OX_LIMITATIONS.md[]
|
|
410
|
+
|
|
411
|
+
=== Adapter selection guide
|
|
412
|
+
|
|
413
|
+
*Choose Nokogiri when:*
|
|
414
|
+
|
|
415
|
+
* You need industry-standard compatibility
|
|
416
|
+
* Large community support is important
|
|
417
|
+
* C extension performance is acceptable
|
|
418
|
+
* Cross-platform deployment is required
|
|
419
|
+
|
|
420
|
+
*Choose Oga when:*
|
|
421
|
+
|
|
422
|
+
* Pure Ruby environment is required (JRuby, TruffleRuby)
|
|
423
|
+
* Best test coverage is needed (98%)
|
|
424
|
+
* No C extensions are allowed
|
|
425
|
+
* Memory usage is not the primary concern
|
|
426
|
+
|
|
427
|
+
*Choose REXML when:*
|
|
428
|
+
|
|
429
|
+
* Standard library only (no external gems)
|
|
430
|
+
* Maximum portability is required
|
|
431
|
+
* Small to medium documents
|
|
432
|
+
* Deployment simplicity is critical
|
|
433
|
+
|
|
434
|
+
*Choose LibXML when:*
|
|
435
|
+
|
|
436
|
+
* Alternative to Nokogiri is desired
|
|
437
|
+
* Full namespace support is required
|
|
438
|
+
* Good performance with correctness
|
|
439
|
+
* Native C extension is acceptable
|
|
440
|
+
|
|
441
|
+
*Choose Ox when:*
|
|
442
|
+
|
|
443
|
+
* Maximum parsing speed is critical
|
|
444
|
+
* Simple document structures (limited nesting)
|
|
445
|
+
* XPath usage is minimal or absent
|
|
446
|
+
* Memory efficiency is paramount
|
|
447
|
+
|
|
448
|
+
*Choose HeadedOx when:*
|
|
449
|
+
|
|
450
|
+
* Need Ox's fast parsing with full XPath support
|
|
451
|
+
* Want comprehensive XPath 1.0 features (functions, predicates)
|
|
452
|
+
* Prefer pure Ruby XPath implementation for debugging
|
|
453
|
+
* Need more XPath capabilities than standard Ox provides
|
|
454
|
+
* Memory efficiency is important but XPath features are required
|
|
455
|
+
|
|
456
|
+
CAUTION: Ox's custom XPath engine supports common patterns but may not handle
|
|
457
|
+
complex XPath expressions. Test thoroughly if your use case requires advanced
|
|
458
|
+
XPath.
|
|
87
459
|
|
|
88
|
-
NOTE 2: The native Ox method `locate` is similar to XPath but has a different syntax.
|
|
89
460
|
|
|
90
461
|
== Getting started
|
|
91
462
|
|
|
@@ -97,16 +468,13 @@ Install the gem and at least one supported XML library:
|
|
|
97
468
|
----
|
|
98
469
|
# In your Gemfile
|
|
99
470
|
gem 'moxml'
|
|
100
|
-
gem 'nokogiri' # Or 'oga', 'rexml', or '
|
|
471
|
+
gem 'nokogiri' # Or 'oga', 'rexml', 'ox', or 'libxml-ruby'
|
|
101
472
|
----
|
|
102
473
|
|
|
103
474
|
=== Basic document creation
|
|
104
475
|
|
|
105
476
|
[source,ruby]
|
|
106
477
|
----
|
|
107
|
-
require 'moxml'
|
|
108
|
-
|
|
109
|
-
# Create a new XML document
|
|
110
478
|
doc = Moxml.new.create_document
|
|
111
479
|
|
|
112
480
|
# Add XML declaration
|
|
@@ -126,6 +494,42 @@ root.add_child(title)
|
|
|
126
494
|
puts doc.to_xml(indent: 2)
|
|
127
495
|
----
|
|
128
496
|
|
|
497
|
+
== Real-world examples
|
|
498
|
+
|
|
499
|
+
Practical, runnable examples demonstrating Moxml usage in common scenarios are
|
|
500
|
+
available in the link:examples/[examples directory].
|
|
501
|
+
|
|
502
|
+
These examples include:
|
|
503
|
+
|
|
504
|
+
link:examples/rss_parser/[**RSS Parser**]::
|
|
505
|
+
Parse RSS/Atom feeds with XPath queries and namespace handling
|
|
506
|
+
|
|
507
|
+
link:examples/web_scraper/[**Web Scraper**]::
|
|
508
|
+
Extract data from HTML/XML using DOM navigation and table parsing
|
|
509
|
+
|
|
510
|
+
link:examples/api_client/[**API Client**]::
|
|
511
|
+
Build and parse XML API requests/responses with SOAP
|
|
512
|
+
|
|
513
|
+
Each example is:
|
|
514
|
+
|
|
515
|
+
* Fully documented with detailed README
|
|
516
|
+
* Self-contained and runnable
|
|
517
|
+
* Demonstrates best practices
|
|
518
|
+
* Includes sample data files
|
|
519
|
+
* Shows comprehensive error handling
|
|
520
|
+
|
|
521
|
+
Run any example directly:
|
|
522
|
+
|
|
523
|
+
[source,shell]
|
|
524
|
+
----
|
|
525
|
+
ruby examples/rss_parser/rss_parser.rb
|
|
526
|
+
ruby examples/web_scraper/web_scraper.rb
|
|
527
|
+
ruby examples/api_client/api_client.rb
|
|
528
|
+
----
|
|
529
|
+
|
|
530
|
+
See the link:examples/README.md[examples README] for complete documentation and
|
|
531
|
+
learning paths.
|
|
532
|
+
|
|
129
533
|
== Working with documents
|
|
130
534
|
|
|
131
535
|
=== Using the builder pattern
|
|
@@ -174,6 +578,7 @@ doc.add_child(root)
|
|
|
174
578
|
# Add elements with attributes
|
|
175
579
|
book = doc.create_element('book')
|
|
176
580
|
book['id'] = 'b1'
|
|
581
|
+
book['type'] = 'technical'
|
|
177
582
|
root.add_child(book)
|
|
178
583
|
|
|
179
584
|
# Add mixed content
|
|
@@ -183,286 +588,80 @@ title.text = 'Ruby Programming'
|
|
|
183
588
|
book.add_child(title)
|
|
184
589
|
----
|
|
185
590
|
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
=== Document object
|
|
189
|
-
|
|
190
|
-
The Document object represents an XML document and serves as the root container
|
|
191
|
-
for all XML nodes.
|
|
192
|
-
|
|
193
|
-
[source,ruby]
|
|
194
|
-
----
|
|
195
|
-
# Creating a document
|
|
196
|
-
doc = Moxml.new.create_document
|
|
197
|
-
doc = Moxml.new.parse(xml_string)
|
|
198
|
-
|
|
199
|
-
# Document properties and methods
|
|
200
|
-
doc.encoding # Get document encoding
|
|
201
|
-
doc.encoding = "UTF-8" # Set document encoding
|
|
202
|
-
doc.version # Get XML version
|
|
203
|
-
doc.version = "1.1" # Set XML version
|
|
204
|
-
doc.standalone # Get standalone declaration
|
|
205
|
-
doc.standalone = "yes" # Set standalone declaration
|
|
206
|
-
|
|
207
|
-
# Document structure
|
|
208
|
-
doc.root # Get root element
|
|
209
|
-
doc.children # Get all top-level nodes
|
|
210
|
-
doc.add_child(node) # Add a child node
|
|
211
|
-
doc.remove_child(node) # Remove a child node
|
|
212
|
-
|
|
213
|
-
# Node creation methods
|
|
214
|
-
doc.create_element(name) # Create new element
|
|
215
|
-
doc.create_text(content) # Create text node
|
|
216
|
-
doc.create_cdata(content) # Create CDATA section
|
|
217
|
-
doc.create_comment(content) # Create comment
|
|
218
|
-
doc.create_processing_instruction(target, content) # Create PI
|
|
219
|
-
|
|
220
|
-
# Document querying
|
|
221
|
-
doc.xpath(expression) # Find nodes by XPath
|
|
222
|
-
doc.at_xpath(expression) # Find first node by XPath
|
|
223
|
-
|
|
224
|
-
# Serialization
|
|
225
|
-
doc.to_xml(options) # Convert to XML string
|
|
226
|
-
----
|
|
227
|
-
|
|
228
|
-
=== Element object
|
|
591
|
+
=== Fluent interface API
|
|
229
592
|
|
|
230
|
-
|
|
231
|
-
tags with attributes and content.
|
|
593
|
+
Moxml provides a fluent, chainable API for improved developer experience:
|
|
232
594
|
|
|
233
595
|
[source,ruby]
|
|
234
596
|
----
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
element.text = "content" # Set text content
|
|
240
|
-
element.inner_text # Get text content for current node only
|
|
241
|
-
element.inner_xml # Get inner XML content
|
|
242
|
-
element.inner_xml = xml # Set inner XML content
|
|
243
|
-
|
|
244
|
-
# Attributes
|
|
245
|
-
element[name] # Get attribute value
|
|
246
|
-
element[name] = value # Set attribute value
|
|
247
|
-
element.attributes # Get all attributes
|
|
248
|
-
element.remove_attribute(name) # Remove attribute
|
|
249
|
-
|
|
250
|
-
# Namespace handling
|
|
251
|
-
element.namespace # Get element's namespace
|
|
252
|
-
element.namespace = ns # Set element's namespace
|
|
253
|
-
element.add_namespace(prefix, uri) # Add new namespace
|
|
254
|
-
element.namespaces # Get all namespace definitions
|
|
255
|
-
|
|
256
|
-
# Node structure
|
|
257
|
-
element.parent # Get parent node
|
|
258
|
-
element.children # Get child nodes
|
|
259
|
-
element.add_child(node) # Add child node
|
|
260
|
-
element.remove_child(node) # Remove child node
|
|
261
|
-
element.add_previous_sibling(node) # Add sibling before
|
|
262
|
-
element.add_next_sibling(node) # Add sibling after
|
|
263
|
-
element.replace(node) # Replace with another node
|
|
264
|
-
element.remove # Remove from document
|
|
265
|
-
|
|
266
|
-
# Node type checking
|
|
267
|
-
element.element? # Returns true
|
|
268
|
-
element.text? # Returns false
|
|
269
|
-
element.cdata? # Returns false
|
|
270
|
-
element.comment? # Returns false
|
|
271
|
-
element.processing_instruction? # Returns false
|
|
272
|
-
|
|
273
|
-
# Node querying
|
|
274
|
-
element.xpath(expression) # Find nodes by XPath
|
|
275
|
-
element.at_xpath(expression) # Find first node by XPath
|
|
276
|
-
----
|
|
277
|
-
|
|
278
|
-
=== Text object
|
|
279
|
-
|
|
280
|
-
Text nodes represent character data in the XML document.
|
|
281
|
-
|
|
282
|
-
[source,ruby]
|
|
597
|
+
element = doc.create_element('book')
|
|
598
|
+
.set_attributes(id: "123", type: "technical")
|
|
599
|
+
.with_namespace("dc", "http://purl.org/dc/elements/1.1/")
|
|
600
|
+
.with_child(doc.create_element("title"))
|
|
283
601
|
----
|
|
284
|
-
# Creating text nodes
|
|
285
|
-
text = doc.create_text("content")
|
|
286
|
-
|
|
287
|
-
# Text properties
|
|
288
|
-
text.content # Get text content
|
|
289
|
-
text.content = "new" # Set text content
|
|
290
|
-
|
|
291
|
-
# Node type checking
|
|
292
|
-
text.text? # Returns true
|
|
293
|
-
|
|
294
|
-
# Structure
|
|
295
|
-
text.parent # Get parent node
|
|
296
|
-
text.remove # Remove from document
|
|
297
|
-
text.replace(node) # Replace with another node
|
|
298
|
-
----
|
|
299
|
-
|
|
300
|
-
=== CDATA object
|
|
301
|
-
|
|
302
|
-
CDATA sections contain text that should not be parsed as markup.
|
|
303
|
-
|
|
304
|
-
[source,ruby]
|
|
305
|
-
----
|
|
306
|
-
# Creating CDATA sections
|
|
307
|
-
cdata = doc.create_cdata("<raw>content</raw>")
|
|
308
|
-
|
|
309
|
-
# CDATA properties
|
|
310
|
-
cdata.content # Get CDATA content
|
|
311
|
-
cdata.content = "new" # Set CDATA content
|
|
312
|
-
|
|
313
|
-
# Node type checking
|
|
314
|
-
cdata.cdata? # Returns true
|
|
315
|
-
|
|
316
|
-
# Structure
|
|
317
|
-
cdata.parent # Get parent node
|
|
318
|
-
cdata.remove # Remove from document
|
|
319
|
-
cdata.replace(node) # Replace with another node
|
|
320
|
-
----
|
|
321
|
-
|
|
322
|
-
=== Comment object
|
|
323
|
-
|
|
324
|
-
Comments contain human-readable notes in the XML document.
|
|
325
|
-
|
|
326
|
-
[source,ruby]
|
|
327
|
-
----
|
|
328
|
-
# Creating comments
|
|
329
|
-
comment = doc.create_comment("Note")
|
|
330
|
-
|
|
331
|
-
# Comment properties
|
|
332
|
-
comment.content # Get comment content
|
|
333
|
-
comment.content = "new" # Set comment content
|
|
334
602
|
|
|
335
|
-
|
|
336
|
-
comment.comment? # Returns true
|
|
603
|
+
For complete fluent API documentation including all chainable methods, convenience methods, and practical examples, see link:docs/_guides/working-with-documents.adoc[Working with Documents Guide].
|
|
337
604
|
|
|
338
|
-
# Structure
|
|
339
|
-
comment.parent # Get parent node
|
|
340
|
-
comment.remove # Remove from document
|
|
341
|
-
comment.replace(node) # Replace with another node
|
|
342
|
-
----
|
|
343
605
|
|
|
344
|
-
===
|
|
606
|
+
=== SAX (Event-Driven) Parsing
|
|
345
607
|
|
|
346
|
-
|
|
608
|
+
SAX (Simple API for XML) provides memory-efficient, event-driven XML parsing for large documents.
|
|
347
609
|
|
|
348
|
-
|
|
349
|
-
----
|
|
350
|
-
# Creating processing instructions
|
|
351
|
-
pi = doc.create_processing_instruction("xml-stylesheet",
|
|
352
|
-
'type="text/xsl" href="style.xsl"')
|
|
353
|
-
|
|
354
|
-
# PI properties
|
|
355
|
-
pi.target # Get PI target
|
|
356
|
-
pi.target = "new" # Set PI target
|
|
357
|
-
pi.content # Get PI content
|
|
358
|
-
pi.content = "new" # Set PI content
|
|
359
|
-
|
|
360
|
-
# Node type checking
|
|
361
|
-
pi.processing_instruction? # Returns true
|
|
362
|
-
|
|
363
|
-
# Structure
|
|
364
|
-
pi.parent # Get parent node
|
|
365
|
-
pi.remove # Remove from document
|
|
366
|
-
pi.replace(node) # Replace with another node
|
|
367
|
-
----
|
|
610
|
+
**When to use SAX:**
|
|
368
611
|
|
|
369
|
-
|
|
612
|
+
* Processing very large XML files (>100MB)
|
|
613
|
+
* Memory-constrained environments
|
|
614
|
+
* Streaming data extraction
|
|
615
|
+
* Need to process data as it arrives
|
|
370
616
|
|
|
371
|
-
|
|
617
|
+
**Quick example:**
|
|
372
618
|
|
|
373
619
|
[source,ruby]
|
|
374
620
|
----
|
|
375
|
-
|
|
376
|
-
|
|
377
|
-
attr.name = "new" # Set attribute name
|
|
378
|
-
attr.value # Get attribute value
|
|
379
|
-
attr.value = "new" # Set attribute value
|
|
380
|
-
|
|
381
|
-
# Namespace handling
|
|
382
|
-
attr.namespace # Get attribute's namespace
|
|
383
|
-
attr.namespace = ns # Set attribute's namespace
|
|
384
|
-
|
|
385
|
-
# Node type checking
|
|
386
|
-
attr.attribute? # Returns true
|
|
387
|
-
----
|
|
621
|
+
class BookExtractor < Moxml::SAX::ElementHandler
|
|
622
|
+
attr_reader :books
|
|
388
623
|
|
|
389
|
-
|
|
624
|
+
def initialize
|
|
625
|
+
super
|
|
626
|
+
@books = []
|
|
627
|
+
end
|
|
390
628
|
|
|
391
|
-
|
|
629
|
+
def on_start_element(name, attributes = {}, namespaces = {})
|
|
630
|
+
super
|
|
631
|
+
@books << { id: attributes["id"] } if name == "book"
|
|
632
|
+
end
|
|
633
|
+
end
|
|
392
634
|
|
|
393
|
-
|
|
635
|
+
handler = BookExtractor.new
|
|
636
|
+
Moxml.new.sax_parse(xml_string, handler)
|
|
637
|
+
puts handler.books.inspect
|
|
394
638
|
----
|
|
395
|
-
# Namespace properties
|
|
396
|
-
ns.prefix # Get namespace prefix
|
|
397
|
-
ns.uri # Get namespace URI
|
|
398
639
|
|
|
399
|
-
|
|
400
|
-
ns.to_s # Format as xmlns declaration
|
|
640
|
+
For complete SAX documentation including all handler types, event methods, adapter support, and best practices, see link:docs/_guides/sax-parsing.adoc[SAX Parsing Guide].
|
|
401
641
|
|
|
402
|
-
|
|
403
|
-
ns.namespace? # Returns true
|
|
404
|
-
----
|
|
405
|
-
|
|
406
|
-
=== Node traversal and inspection
|
|
642
|
+
== XML objects and their methods
|
|
407
643
|
|
|
408
|
-
|
|
644
|
+
For complete node API reference including traversal methods, manipulation, queries, type checking, and node information, see link:docs/_pages/node-api-reference.adoc[Node API Reference].
|
|
409
645
|
|
|
410
|
-
[source,ruby]
|
|
411
|
-
----
|
|
412
|
-
node.parent # Get parent node
|
|
413
|
-
node.children # Get child nodes
|
|
414
|
-
node.next_sibling # Get next sibling
|
|
415
|
-
node.previous_sibling # Get previous sibling
|
|
416
|
-
|
|
417
|
-
# Type checking
|
|
418
|
-
node.element? # Is it an element?
|
|
419
|
-
node.text? # Is it a text node?
|
|
420
|
-
node.cdata? # Is it a CDATA section?
|
|
421
|
-
node.comment? # Is it a comment?
|
|
422
|
-
node.processing_instruction? # Is it a PI?
|
|
423
|
-
node.attribute? # Is it an attribute?
|
|
424
|
-
node.namespace? # Is it a namespace?
|
|
425
|
-
|
|
426
|
-
# Node information
|
|
427
|
-
node.document # Get owning document
|
|
428
|
-
----
|
|
429
646
|
|
|
430
647
|
== Advanced features
|
|
431
648
|
|
|
432
|
-
=== XPath querying
|
|
433
|
-
|
|
434
|
-
==== Nokogiri, Oga, REXML
|
|
649
|
+
=== XPath querying
|
|
435
650
|
|
|
436
|
-
Moxml provides efficient XPath querying
|
|
437
|
-
implementation while maintaining consistent node mapping:
|
|
651
|
+
Moxml provides efficient XPath querying with consistent node mapping:
|
|
438
652
|
|
|
439
653
|
[source,ruby]
|
|
440
654
|
----
|
|
441
655
|
# Find all book elements
|
|
442
656
|
books = doc.xpath('//book')
|
|
443
|
-
# Returns Moxml::Element objects mapped to native nodes
|
|
444
657
|
|
|
445
658
|
# Find with namespaces
|
|
446
|
-
titles = doc.xpath('//dc:title',
|
|
447
|
-
'dc' => 'http://purl.org/dc/elements/1.1/')
|
|
659
|
+
titles = doc.xpath('//dc:title', 'dc' => 'http://purl.org/dc/elements/1.1/')
|
|
448
660
|
|
|
449
661
|
# Find first matching node
|
|
450
662
|
first_book = doc.at_xpath('//book')
|
|
451
|
-
|
|
452
|
-
# Chain queries
|
|
453
|
-
doc.xpath('//book').each do |book|
|
|
454
|
-
# Each book is a mapped Moxml::Element
|
|
455
|
-
title = book.at_xpath('.//title')
|
|
456
|
-
puts "#{book['id']}: #{title.text}"
|
|
457
|
-
end
|
|
458
663
|
----
|
|
459
664
|
|
|
460
|
-
==== Ox
|
|
461
|
-
|
|
462
|
-
The native Ox's query method
|
|
463
|
-
https://www.ohler.com/ox/Ox/Element.html#method-i-locate[`locate`] resembles
|
|
464
|
-
XPath but has a different syntax.
|
|
465
|
-
|
|
466
665
|
=== Namespace handling
|
|
467
666
|
|
|
468
667
|
[source,ruby]
|
|
@@ -472,62 +671,35 @@ element.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
|
|
|
472
671
|
|
|
473
672
|
# Create element in namespace
|
|
474
673
|
title = doc.create_element('dc:title')
|
|
475
|
-
title.text = 'Document Title'
|
|
476
|
-
|
|
477
|
-
# Query with namespaces
|
|
478
|
-
doc.xpath('//dc:title',
|
|
479
|
-
'dc' => 'http://purl.org/dc/elements/1.1/')
|
|
480
|
-
----
|
|
481
|
-
|
|
482
|
-
=== Accessing native implementation
|
|
483
|
-
|
|
484
|
-
While not typically needed, you can access the underlying XML library's nodes:
|
|
485
|
-
|
|
486
|
-
[source,ruby]
|
|
487
674
|
----
|
|
488
|
-
# Get native node
|
|
489
|
-
native_node = element.native
|
|
490
675
|
|
|
491
|
-
|
|
492
|
-
adapter = element.context.config.adapter
|
|
676
|
+
For complete documentation on XPath querying, namespace handling, and accessing native implementations, see link:docs/_guides/advanced-features.adoc[Advanced Features Guide].
|
|
493
677
|
|
|
494
|
-
# Create from native node
|
|
495
|
-
element = Moxml::Element.new(native_node, context)
|
|
496
|
-
----
|
|
497
678
|
|
|
498
679
|
== Error handling
|
|
499
680
|
|
|
500
|
-
Moxml provides
|
|
501
|
-
occur during XML processing:
|
|
681
|
+
Moxml provides comprehensive error classes with enhanced context for debugging:
|
|
502
682
|
|
|
503
683
|
[source,ruby]
|
|
504
684
|
----
|
|
505
685
|
begin
|
|
506
|
-
doc =
|
|
686
|
+
doc = Moxml.new.parse(xml_string, strict: true)
|
|
687
|
+
results = doc.xpath("//book[@id='123']")
|
|
507
688
|
rescue Moxml::ParseError => e
|
|
508
|
-
|
|
509
|
-
puts "Parse error at line #{e.line}, column #{e.column}"
|
|
510
|
-
puts "Message: #{e.message}"
|
|
511
|
-
rescue Moxml::ValidationError => e
|
|
512
|
-
# Handles XML validation errors
|
|
513
|
-
puts "Validation error: #{e.message}"
|
|
689
|
+
puts "Parse failed at line #{e.line}: #{e.message}"
|
|
514
690
|
rescue Moxml::XPathError => e
|
|
515
|
-
|
|
516
|
-
puts "XPath error: #{e.message}"
|
|
517
|
-
rescue Moxml::NamespaceError => e
|
|
518
|
-
# Handles namespace errors
|
|
519
|
-
puts "Namespace error: #{e.message}"
|
|
691
|
+
puts "XPath error: #{e.expression}"
|
|
520
692
|
rescue Moxml::Error => e
|
|
521
|
-
|
|
522
|
-
puts "Error: #{e.message}"
|
|
693
|
+
puts "XML processing error: #{e.message}"
|
|
523
694
|
end
|
|
524
695
|
----
|
|
525
696
|
|
|
526
|
-
|
|
697
|
+
For complete error class hierarchy, error types, best practices, and debugging techniques, see link:docs/_pages/error-handling.adoc[Error Handling Guide].
|
|
527
698
|
|
|
528
|
-
=== General
|
|
529
699
|
|
|
530
|
-
|
|
700
|
+
== Configuration
|
|
701
|
+
|
|
702
|
+
Moxml can be configured globally or per instance:
|
|
531
703
|
|
|
532
704
|
[source,ruby]
|
|
533
705
|
----
|
|
@@ -539,124 +711,115 @@ Moxml.configure do |config|
|
|
|
539
711
|
end
|
|
540
712
|
|
|
541
713
|
# Instance configuration
|
|
542
|
-
|
|
714
|
+
context = Moxml.new do |config|
|
|
543
715
|
config.adapter = :oga
|
|
544
716
|
config.strict = false
|
|
545
717
|
end
|
|
546
718
|
----
|
|
547
719
|
|
|
548
|
-
|
|
720
|
+
For all configuration options, adapter selection, serialization options, and environment-based configuration, see link:docs/_pages/configuration.adoc[Configuration Guide].
|
|
549
721
|
|
|
550
|
-
To select a non-default adapter, set it before processing any input using the
|
|
551
|
-
following syntax.
|
|
552
722
|
|
|
553
|
-
[source,ruby]
|
|
554
|
-
----
|
|
555
|
-
Moxml::Config.default_adapter = <adapter-symbol>
|
|
556
|
-
----
|
|
557
723
|
|
|
558
|
-
|
|
724
|
+
== Thread safety
|
|
559
725
|
|
|
560
|
-
|
|
726
|
+
For complete information on thread-safe patterns, context management, and concurrent processing, see the link:docs/_pages/thread-safety.adoc[Thread Safety Guide].
|
|
561
727
|
|
|
562
|
-
`:nokogiri`:: Nokogiri (default)
|
|
563
728
|
|
|
564
|
-
|
|
729
|
+
== Performance considerations
|
|
565
730
|
|
|
566
|
-
|
|
731
|
+
For detailed performance optimization strategies, memory management best practices, and efficient querying patterns, see the link:docs/_pages/performance.adoc[Performance Considerations Guide].
|
|
567
732
|
|
|
733
|
+
== Best practices
|
|
568
734
|
|
|
569
|
-
|
|
735
|
+
For comprehensive best practices covering XPath queries, adapter selection, error handling, namespace handling, memory management, thread safety, performance optimization, and testing strategies, see link:docs/_pages/best-practices.adoc[Best Practices Guide].
|
|
570
736
|
|
|
571
|
-
Moxml is thread-safe when used properly. Each instance maintains its own state
|
|
572
|
-
and can be used safely in concurrent operations:
|
|
573
737
|
|
|
574
|
-
|
|
575
|
-
----
|
|
576
|
-
class XmlProcessor
|
|
577
|
-
def initialize
|
|
578
|
-
@mutex = Mutex.new
|
|
579
|
-
@context = Moxml.new
|
|
580
|
-
end
|
|
738
|
+
== Specific adapter limitations
|
|
581
739
|
|
|
582
|
-
|
|
583
|
-
@mutex.synchronize do
|
|
584
|
-
doc = @context.parse(xml)
|
|
585
|
-
# Modify document
|
|
586
|
-
doc.to_xml
|
|
587
|
-
end
|
|
588
|
-
end
|
|
589
|
-
end
|
|
590
|
-
----
|
|
740
|
+
=== Ox adapter
|
|
591
741
|
|
|
592
|
-
|
|
742
|
+
The Ox adapter provides maximum parsing speed but has XPath limitations.
|
|
593
743
|
|
|
594
|
-
|
|
744
|
+
**XPath limitations:**
|
|
595
745
|
|
|
596
|
-
|
|
746
|
+
* No attribute value predicates: `//book[@id='123']` ❌
|
|
747
|
+
* No logical operators, position predicates, text predicates ❌
|
|
748
|
+
* No namespace queries, parent axis, sibling axes ❌
|
|
749
|
+
* No XPath functions ❌
|
|
750
|
+
|
|
751
|
+
**Workaround:** Use Ruby enumerable methods:
|
|
597
752
|
|
|
598
753
|
[source,ruby]
|
|
599
754
|
----
|
|
600
|
-
|
|
601
|
-
|
|
602
|
-
doc = nil # Allow garbage collection of document and registry
|
|
603
|
-
GC.start # Force garbage collection if needed
|
|
755
|
+
# Instead of: doc.xpath("//book[@id='123']")
|
|
756
|
+
doc.xpath("//book").find { |book| book["id"] == "123" }
|
|
604
757
|
----
|
|
605
758
|
|
|
606
|
-
|
|
759
|
+
For complete Ox adapter documentation including all limitations and workarounds, see link:docs/_pages/adapters/ox.adoc[Ox Adapter Guide].
|
|
607
760
|
|
|
608
|
-
|
|
761
|
+
=== HeadedOx adapter
|
|
609
762
|
|
|
610
|
-
|
|
611
|
-
----
|
|
612
|
-
# More efficient - specific path
|
|
613
|
-
doc.xpath('//book/title')
|
|
763
|
+
The HeadedOx adapter combines Ox's fast C-based XML parsing with Moxml's comprehensive pure Ruby XPath 1.0 engine.
|
|
614
764
|
|
|
615
|
-
|
|
616
|
-
doc.xpath('//title')
|
|
765
|
+
**Status:** Production-ready v1.2 (99.20% pass rate, 1,992/2,008 tests)
|
|
617
766
|
|
|
618
|
-
|
|
619
|
-
root.xpath('./*/title')
|
|
620
|
-
----
|
|
767
|
+
**Key features:**
|
|
621
768
|
|
|
622
|
-
|
|
769
|
+
* Fast XML parsing (Ox C extension)
|
|
770
|
+
* All 27 XPath 1.0 functions
|
|
771
|
+
* 6 XPath axes (child, descendant, parent, attribute, self, descendant-or-self)
|
|
772
|
+
* Expression caching for performance
|
|
773
|
+
* Pure Ruby XPath engine (debuggable)
|
|
623
774
|
|
|
624
|
-
|
|
775
|
+
**When to use:**
|
|
776
|
+
|
|
777
|
+
* Need Ox's fast parsing with comprehensive XPath
|
|
778
|
+
* Want XPath functions (count, sum, contains, etc.)
|
|
779
|
+
* Prefer pure Ruby XPath for debugging
|
|
780
|
+
* Basic namespace queries are sufficient
|
|
625
781
|
|
|
626
782
|
[source,ruby]
|
|
627
783
|
----
|
|
628
|
-
#
|
|
629
|
-
|
|
630
|
-
|
|
631
|
-
element 'root' do
|
|
632
|
-
element 'child' do
|
|
633
|
-
text 'content'
|
|
634
|
-
end
|
|
635
|
-
end
|
|
636
|
-
end
|
|
784
|
+
# Use HeadedOx adapter
|
|
785
|
+
context = Moxml.new(:headed_ox)
|
|
786
|
+
doc = context.parse(xml_string)
|
|
637
787
|
|
|
638
|
-
#
|
|
639
|
-
|
|
640
|
-
doc.
|
|
641
|
-
|
|
642
|
-
doc.add_child(root)
|
|
788
|
+
# Full XPath 1.0 support
|
|
789
|
+
books = doc.xpath('//book[@price < 20]')
|
|
790
|
+
count = doc.xpath('count(//book)')
|
|
791
|
+
titles = doc.xpath('//book/title[contains(., "Ruby")]')
|
|
643
792
|
----
|
|
644
793
|
|
|
645
|
-
|
|
794
|
+
For complete HeadedOx documentation including architecture, XPath capabilities, known limitations, and usage examples, see link:docs/_pages/adapters/headed-ox.adoc[HeadedOx Adapter Guide] and link:docs/HEADED_OX_LIMITATIONS.md[Limitations Documentation].
|
|
646
795
|
|
|
647
|
-
|
|
648
|
-
|
|
649
|
-
|
|
650
|
-
|
|
651
|
-
|
|
652
|
-
|
|
653
|
-
|
|
654
|
-
|
|
655
|
-
|
|
656
|
-
|
|
657
|
-
|
|
658
|
-
|
|
659
|
-
|
|
796
|
+
|
|
797
|
+
==== LibXML adapter
|
|
798
|
+
|
|
799
|
+
*DOCTYPE Limitations:*
|
|
800
|
+
|
|
801
|
+
* DOCTYPE parsing works
|
|
802
|
+
* DOCTYPE round-trip preservation is limited
|
|
803
|
+
* DOCTYPE cannot be reliably re-serialized after parsing
|
|
804
|
+
|
|
805
|
+
*Performance:*
|
|
806
|
+
|
|
807
|
+
* Serialization speed: ~120 ips (slower than target)
|
|
808
|
+
* Parsing speed: Good
|
|
809
|
+
* For high-throughput serialization, consider Ox or Nokogiri
|
|
810
|
+
|
|
811
|
+
=== Other adapters
|
|
812
|
+
|
|
813
|
+
*Nokogiri, Oga, REXML:*
|
|
814
|
+
|
|
815
|
+
All three adapters have near-complete feature support with only minor edge case
|
|
816
|
+
limitations. Use these adapters when you need full XPath and namespace support.
|
|
817
|
+
|
|
818
|
+
|
|
819
|
+
|
|
820
|
+
== Development and testing
|
|
821
|
+
|
|
822
|
+
For complete information on development setup, testing strategies, benchmarking, and coverage reporting, see the link:docs/_guides/development-testing.adoc[Development and Testing Guide].
|
|
660
823
|
|
|
661
824
|
== Contributing
|
|
662
825
|
|
|
@@ -670,6 +833,5 @@ end
|
|
|
670
833
|
|
|
671
834
|
Copyright Ribose.
|
|
672
835
|
|
|
673
|
-
This project is licensed under the
|
|
836
|
+
This project is licensed under the Ribose 3-Clause BSD License. See the
|
|
674
837
|
link:LICENSE.md[] file for details.
|
|
675
|
-
|