moxml 0.1.6 → 0.1.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/workflows/dependent-repos.json +5 -0
- data/.github/workflows/dependent-tests.yml +20 -0
- data/.github/workflows/docs.yml +59 -0
- data/.github/workflows/rake.yml +12 -4
- data/.github/workflows/release.yml +5 -3
- data/.gitignore +37 -0
- data/.rubocop.yml +15 -7
- data/.rubocop_todo.yml +238 -40
- data/Gemfile +14 -9
- data/LICENSE.md +6 -2
- data/README.adoc +535 -373
- data/Rakefile +53 -0
- data/benchmarks/.gitignore +6 -0
- data/benchmarks/generate_report.rb +550 -0
- data/docs/Gemfile +13 -0
- data/docs/_config.yml +138 -0
- data/docs/_guides/advanced-features.adoc +87 -0
- data/docs/_guides/development-testing.adoc +165 -0
- data/docs/_guides/index.adoc +45 -0
- data/docs/_guides/modifying-xml.adoc +293 -0
- data/docs/_guides/parsing-xml.adoc +231 -0
- data/docs/_guides/sax-parsing.adoc +603 -0
- data/docs/_guides/working-with-documents.adoc +118 -0
- data/docs/_pages/adapter-compatibility.adoc +369 -0
- data/docs/_pages/adapters/headed-ox.adoc +237 -0
- data/docs/_pages/adapters/index.adoc +98 -0
- data/docs/_pages/adapters/libxml.adoc +286 -0
- data/docs/_pages/adapters/nokogiri.adoc +252 -0
- data/docs/_pages/adapters/oga.adoc +292 -0
- data/docs/_pages/adapters/ox.adoc +55 -0
- data/docs/_pages/adapters/rexml.adoc +293 -0
- data/docs/_pages/best-practices.adoc +430 -0
- data/docs/_pages/compatibility.adoc +468 -0
- data/docs/_pages/configuration.adoc +251 -0
- data/docs/_pages/error-handling.adoc +350 -0
- data/docs/_pages/headed-ox-limitations.adoc +558 -0
- data/docs/_pages/headed-ox.adoc +1025 -0
- data/docs/_pages/index.adoc +35 -0
- data/docs/_pages/installation.adoc +141 -0
- data/docs/_pages/node-api-reference.adoc +50 -0
- data/docs/_pages/performance.adoc +36 -0
- data/docs/_pages/quick-start.adoc +244 -0
- data/docs/_pages/thread-safety.adoc +29 -0
- data/docs/_references/document-api.adoc +408 -0
- data/docs/_references/index.adoc +48 -0
- data/docs/_tutorials/basic-usage.adoc +268 -0
- data/docs/_tutorials/builder-pattern.adoc +343 -0
- data/docs/_tutorials/index.adoc +33 -0
- data/docs/_tutorials/namespace-handling.adoc +325 -0
- data/docs/_tutorials/xpath-queries.adoc +359 -0
- data/docs/index.adoc +122 -0
- data/examples/README.md +124 -0
- data/examples/api_client/README.md +424 -0
- data/examples/api_client/api_client.rb +394 -0
- data/examples/api_client/example_response.xml +48 -0
- data/examples/headed_ox_example/README.md +90 -0
- data/examples/headed_ox_example/headed_ox_demo.rb +71 -0
- data/examples/rss_parser/README.md +194 -0
- data/examples/rss_parser/example_feed.xml +93 -0
- data/examples/rss_parser/rss_parser.rb +189 -0
- data/examples/sax_parsing/README.md +50 -0
- data/examples/sax_parsing/data_extractor.rb +75 -0
- data/examples/sax_parsing/example.xml +21 -0
- data/examples/sax_parsing/large_file.rb +78 -0
- data/examples/sax_parsing/simple_parser.rb +55 -0
- data/examples/web_scraper/README.md +352 -0
- data/examples/web_scraper/example_page.html +201 -0
- data/examples/web_scraper/web_scraper.rb +312 -0
- data/lib/moxml/adapter/base.rb +107 -28
- data/lib/moxml/adapter/customized_libxml/cdata.rb +28 -0
- data/lib/moxml/adapter/customized_libxml/comment.rb +24 -0
- data/lib/moxml/adapter/customized_libxml/declaration.rb +85 -0
- data/lib/moxml/adapter/customized_libxml/element.rb +39 -0
- data/lib/moxml/adapter/customized_libxml/node.rb +44 -0
- data/lib/moxml/adapter/customized_libxml/processing_instruction.rb +31 -0
- data/lib/moxml/adapter/customized_libxml/text.rb +27 -0
- data/lib/moxml/adapter/customized_oga/xml_generator.rb +1 -1
- data/lib/moxml/adapter/customized_ox/attribute.rb +28 -3
- data/lib/moxml/adapter/customized_ox/namespace.rb +0 -2
- data/lib/moxml/adapter/customized_ox/text.rb +0 -2
- data/lib/moxml/adapter/customized_rexml/formatter.rb +11 -6
- data/lib/moxml/adapter/headed_ox.rb +161 -0
- data/lib/moxml/adapter/libxml.rb +1548 -0
- data/lib/moxml/adapter/nokogiri.rb +121 -9
- data/lib/moxml/adapter/oga.rb +123 -12
- data/lib/moxml/adapter/ox.rb +283 -27
- data/lib/moxml/adapter/rexml.rb +127 -20
- data/lib/moxml/adapter.rb +21 -4
- data/lib/moxml/attribute.rb +6 -0
- data/lib/moxml/builder.rb +40 -4
- data/lib/moxml/config.rb +8 -3
- data/lib/moxml/context.rb +39 -1
- data/lib/moxml/doctype.rb +13 -1
- data/lib/moxml/document.rb +39 -6
- data/lib/moxml/document_builder.rb +27 -5
- data/lib/moxml/element.rb +71 -2
- data/lib/moxml/error.rb +175 -6
- data/lib/moxml/node.rb +94 -3
- data/lib/moxml/node_set.rb +34 -0
- data/lib/moxml/sax/block_handler.rb +194 -0
- data/lib/moxml/sax/element_handler.rb +124 -0
- data/lib/moxml/sax/handler.rb +113 -0
- data/lib/moxml/sax.rb +31 -0
- data/lib/moxml/version.rb +1 -1
- data/lib/moxml/xml_utils/encoder.rb +4 -4
- data/lib/moxml/xml_utils.rb +7 -4
- data/lib/moxml/xpath/ast/node.rb +159 -0
- data/lib/moxml/xpath/cache.rb +91 -0
- data/lib/moxml/xpath/compiler.rb +1768 -0
- data/lib/moxml/xpath/context.rb +26 -0
- data/lib/moxml/xpath/conversion.rb +124 -0
- data/lib/moxml/xpath/engine.rb +52 -0
- data/lib/moxml/xpath/errors.rb +101 -0
- data/lib/moxml/xpath/lexer.rb +304 -0
- data/lib/moxml/xpath/parser.rb +485 -0
- data/lib/moxml/xpath/ruby/generator.rb +269 -0
- data/lib/moxml/xpath/ruby/node.rb +193 -0
- data/lib/moxml/xpath.rb +37 -0
- data/lib/moxml.rb +5 -2
- data/moxml.gemspec +3 -1
- data/old-specs/moxml/adapter/customized_libxml/.gitkeep +6 -0
- data/spec/consistency/README.md +77 -0
- data/spec/{moxml/examples/adapter_spec.rb → consistency/adapter_parity_spec.rb} +4 -4
- data/spec/examples/README.md +75 -0
- data/spec/{support/shared_examples/examples/attribute.rb → examples/attribute_examples_spec.rb} +1 -1
- data/spec/{support/shared_examples/examples/basic_usage.rb → examples/basic_usage_spec.rb} +2 -2
- data/spec/{support/shared_examples/examples/namespace.rb → examples/namespace_examples_spec.rb} +3 -3
- data/spec/{support/shared_examples/examples/readme_examples.rb → examples/readme_examples_spec.rb} +6 -4
- data/spec/{support/shared_examples/examples/xpath.rb → examples/xpath_examples_spec.rb} +10 -6
- data/spec/integration/README.md +71 -0
- data/spec/{moxml/all_with_adapters_spec.rb → integration/all_adapters_spec.rb} +3 -2
- data/spec/integration/headed_ox_integration_spec.rb +326 -0
- data/spec/{support → integration}/shared_examples/edge_cases.rb +37 -10
- data/spec/integration/shared_examples/high_level/.gitkeep +0 -0
- data/spec/{support/shared_examples/context.rb → integration/shared_examples/high_level/context_behavior.rb} +2 -1
- data/spec/{support/shared_examples/integration.rb → integration/shared_examples/integration_workflows.rb} +23 -6
- data/spec/integration/shared_examples/node_wrappers/.gitkeep +0 -0
- data/spec/{support/shared_examples/cdata.rb → integration/shared_examples/node_wrappers/cdata_behavior.rb} +6 -1
- data/spec/{support/shared_examples/comment.rb → integration/shared_examples/node_wrappers/comment_behavior.rb} +2 -1
- data/spec/{support/shared_examples/declaration.rb → integration/shared_examples/node_wrappers/declaration_behavior.rb} +5 -2
- data/spec/{support/shared_examples/doctype.rb → integration/shared_examples/node_wrappers/doctype_behavior.rb} +2 -2
- data/spec/{support/shared_examples/document.rb → integration/shared_examples/node_wrappers/document_behavior.rb} +1 -1
- data/spec/{support/shared_examples/node.rb → integration/shared_examples/node_wrappers/node_behavior.rb} +9 -2
- data/spec/{support/shared_examples/node_set.rb → integration/shared_examples/node_wrappers/node_set_behavior.rb} +1 -18
- data/spec/{support/shared_examples/processing_instruction.rb → integration/shared_examples/node_wrappers/processing_instruction_behavior.rb} +6 -2
- data/spec/moxml/README.md +41 -0
- data/spec/moxml/adapter/.gitkeep +0 -0
- data/spec/moxml/adapter/README.md +61 -0
- data/spec/moxml/adapter/base_spec.rb +27 -0
- data/spec/moxml/adapter/headed_ox_spec.rb +311 -0
- data/spec/moxml/adapter/libxml_spec.rb +14 -0
- data/spec/moxml/adapter/ox_spec.rb +9 -8
- data/spec/moxml/adapter/shared_examples/.gitkeep +0 -0
- data/spec/{support/shared_examples/xml_adapter.rb → moxml/adapter/shared_examples/adapter_contract.rb} +39 -12
- data/spec/moxml/adapter_spec.rb +16 -0
- data/spec/moxml/attribute_spec.rb +30 -0
- data/spec/moxml/builder_spec.rb +33 -0
- data/spec/moxml/cdata_spec.rb +31 -0
- data/spec/moxml/comment_spec.rb +31 -0
- data/spec/moxml/config_spec.rb +3 -3
- data/spec/moxml/context_spec.rb +28 -0
- data/spec/moxml/declaration_spec.rb +36 -0
- data/spec/moxml/doctype_spec.rb +33 -0
- data/spec/moxml/document_builder_spec.rb +30 -0
- data/spec/moxml/document_spec.rb +105 -0
- data/spec/moxml/element_spec.rb +143 -0
- data/spec/moxml/error_spec.rb +266 -22
- data/spec/{moxml_spec.rb → moxml/moxml_spec.rb} +9 -9
- data/spec/moxml/namespace_spec.rb +32 -0
- data/spec/moxml/node_set_spec.rb +39 -0
- data/spec/moxml/node_spec.rb +37 -0
- data/spec/moxml/processing_instruction_spec.rb +34 -0
- data/spec/moxml/sax_spec.rb +1067 -0
- data/spec/moxml/text_spec.rb +31 -0
- data/spec/moxml/version_spec.rb +14 -0
- data/spec/moxml/xml_utils/.gitkeep +0 -0
- data/spec/moxml/xml_utils/encoder_spec.rb +27 -0
- data/spec/moxml/xml_utils_spec.rb +49 -0
- data/spec/moxml/xpath/ast/node_spec.rb +83 -0
- data/spec/moxml/xpath/axes_spec.rb +296 -0
- data/spec/moxml/xpath/cache_spec.rb +358 -0
- data/spec/moxml/xpath/compiler_spec.rb +406 -0
- data/spec/moxml/xpath/context_spec.rb +210 -0
- data/spec/moxml/xpath/conversion_spec.rb +365 -0
- data/spec/moxml/xpath/fixtures/sample.xml +25 -0
- data/spec/moxml/xpath/functions/boolean_functions_spec.rb +114 -0
- data/spec/moxml/xpath/functions/node_functions_spec.rb +145 -0
- data/spec/moxml/xpath/functions/numeric_functions_spec.rb +164 -0
- data/spec/moxml/xpath/functions/position_functions_spec.rb +93 -0
- data/spec/moxml/xpath/functions/special_functions_spec.rb +89 -0
- data/spec/moxml/xpath/functions/string_functions_spec.rb +381 -0
- data/spec/moxml/xpath/lexer_spec.rb +488 -0
- data/spec/moxml/xpath/parser_integration_spec.rb +210 -0
- data/spec/moxml/xpath/parser_spec.rb +364 -0
- data/spec/moxml/xpath/ruby/generator_spec.rb +421 -0
- data/spec/moxml/xpath/ruby/node_spec.rb +291 -0
- data/spec/moxml/xpath_capabilities_spec.rb +199 -0
- data/spec/moxml/xpath_spec.rb +77 -0
- data/spec/performance/README.md +83 -0
- data/spec/performance/benchmark_spec.rb +64 -0
- data/spec/{support/shared_examples/examples/memory.rb → performance/memory_usage_spec.rb} +3 -1
- data/spec/{support/shared_examples/examples/thread_safety.rb → performance/thread_safety_spec.rb} +3 -1
- data/spec/performance/xpath_benchmark_spec.rb +259 -0
- data/spec/spec_helper.rb +58 -1
- data/spec/support/xml_matchers.rb +1 -1
- metadata +176 -35
- data/lib/ox/node.rb +0 -9
- data/spec/support/shared_examples/examples/benchmark_spec.rb +0 -51
- /data/spec/{support/shared_examples/builder.rb → integration/shared_examples/high_level/builder_behavior.rb} +0 -0
- /data/spec/{support/shared_examples/document_builder.rb → integration/shared_examples/high_level/document_builder_behavior.rb} +0 -0
- /data/spec/{support/shared_examples/attribute.rb → integration/shared_examples/node_wrappers/attribute_behavior.rb} +0 -0
- /data/spec/{support/shared_examples/element.rb → integration/shared_examples/node_wrappers/element_behavior.rb} +0 -0
- /data/spec/{support/shared_examples/namespace.rb → integration/shared_examples/node_wrappers/namespace_behavior.rb} +0 -0
- /data/spec/{support/shared_examples/text.rb → integration/shared_examples/node_wrappers/text_behavior.rb} +0 -0
|
@@ -0,0 +1,359 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: XPath queries
|
|
3
|
+
parent: Overview
|
|
4
|
+
nav_order: 3
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
== XPath queries
|
|
8
|
+
|
|
9
|
+
=== Purpose
|
|
10
|
+
|
|
11
|
+
Master XPath querying in Moxml to efficiently find and select XML nodes using
|
|
12
|
+
path expressions, predicates, and functions.
|
|
13
|
+
|
|
14
|
+
=== Prerequisites
|
|
15
|
+
|
|
16
|
+
* Basic understanding of XPath syntax
|
|
17
|
+
* Moxml installed with an adapter
|
|
18
|
+
* Familiarity with link:basic-usage[basic Moxml usage]
|
|
19
|
+
|
|
20
|
+
=== Step 1: Basic path expressions
|
|
21
|
+
|
|
22
|
+
Start with simple path selection:
|
|
23
|
+
|
|
24
|
+
[source,ruby]
|
|
25
|
+
----
|
|
26
|
+
require 'moxml'
|
|
27
|
+
|
|
28
|
+
xml = <<~XML
|
|
29
|
+
<library>
|
|
30
|
+
<section name="programming">
|
|
31
|
+
<book id="1">Ruby Basics</book>
|
|
32
|
+
<book id="2">Advanced Ruby</book>
|
|
33
|
+
</section>
|
|
34
|
+
<section name="fiction">
|
|
35
|
+
<book id="3">Ruby Story</book>
|
|
36
|
+
</section>
|
|
37
|
+
</library>
|
|
38
|
+
XML
|
|
39
|
+
|
|
40
|
+
doc = Moxml.new.parse(xml)
|
|
41
|
+
|
|
42
|
+
# Absolute path - from root
|
|
43
|
+
sections = doc.xpath('/library/section')
|
|
44
|
+
puts sections.length # => 2
|
|
45
|
+
|
|
46
|
+
# Descendant-or-self - any depth
|
|
47
|
+
all_books = doc.xpath('//book')
|
|
48
|
+
puts all_books.length # => 3
|
|
49
|
+
|
|
50
|
+
# Relative path - from current node
|
|
51
|
+
section = sections.first
|
|
52
|
+
books = section.xpath('.//book')
|
|
53
|
+
puts books.length # => 2
|
|
54
|
+
|
|
55
|
+
# Parent path
|
|
56
|
+
book = all_books.first
|
|
57
|
+
parent = book.xpath('..').first
|
|
58
|
+
puts parent.name # => "section"
|
|
59
|
+
----
|
|
60
|
+
|
|
61
|
+
=== Step 2: Attribute predicates
|
|
62
|
+
|
|
63
|
+
Filter elements by attributes:
|
|
64
|
+
|
|
65
|
+
[source,ruby]
|
|
66
|
+
----
|
|
67
|
+
# Elements with specific attribute
|
|
68
|
+
books_with_id = doc.xpath('//book[@id]')
|
|
69
|
+
puts books_with_id.length # => 3 (all have id)
|
|
70
|
+
|
|
71
|
+
# Elements with attribute value
|
|
72
|
+
book1 = doc.at_xpath('//book[@id="1"]')
|
|
73
|
+
puts book1.text # => "Ruby Basics"
|
|
74
|
+
|
|
75
|
+
# Multiple attribute conditions
|
|
76
|
+
sections = doc.xpath('//section[@name="programming"]')
|
|
77
|
+
puts sections.first['name'] # => "programming"
|
|
78
|
+
----
|
|
79
|
+
|
|
80
|
+
=== Step 3: Position predicates
|
|
81
|
+
|
|
82
|
+
Select by position:
|
|
83
|
+
|
|
84
|
+
[source,ruby]
|
|
85
|
+
----
|
|
86
|
+
# First book
|
|
87
|
+
first = doc.at_xpath('//book[1]')
|
|
88
|
+
puts first.text # => "Ruby Basics"
|
|
89
|
+
|
|
90
|
+
# Last book
|
|
91
|
+
last = doc.at_xpath('//book[last()]')
|
|
92
|
+
puts last.text # => "Ruby Story"
|
|
93
|
+
|
|
94
|
+
# First two books
|
|
95
|
+
first_two = doc.xpath('//book[position() <= 2]')
|
|
96
|
+
puts first_two.length # => 2
|
|
97
|
+
|
|
98
|
+
# Every second book
|
|
99
|
+
even_books = doc.xpath('//book[position() mod 2 = 0]')
|
|
100
|
+
----
|
|
101
|
+
|
|
102
|
+
=== Step 4: Text predicates
|
|
103
|
+
|
|
104
|
+
Filter by text content:
|
|
105
|
+
|
|
106
|
+
[source,ruby]
|
|
107
|
+
----
|
|
108
|
+
xml = <<~XML
|
|
109
|
+
<library>
|
|
110
|
+
<book><title>Ruby Programming</title></book>
|
|
111
|
+
<book><title>Python Programming</title></book>
|
|
112
|
+
<book><title>Advanced Ruby</title></book>
|
|
113
|
+
</library>
|
|
114
|
+
XML
|
|
115
|
+
|
|
116
|
+
doc = Moxml.new.parse(xml)
|
|
117
|
+
|
|
118
|
+
# Exact text match
|
|
119
|
+
ruby_book = doc.at_xpath('//title[text()="Ruby Programming"]')
|
|
120
|
+
puts ruby_book.text # => "Ruby Programming"
|
|
121
|
+
|
|
122
|
+
# Contains text
|
|
123
|
+
ruby_books = doc.xpath('//title[contains(text(), "Ruby")]')
|
|
124
|
+
puts ruby_books.length # => 2
|
|
125
|
+
|
|
126
|
+
# Starts with
|
|
127
|
+
programming = doc.xpath('//title[starts-with(text(), "Ruby")]')
|
|
128
|
+
----
|
|
129
|
+
|
|
130
|
+
=== Step 5: Logical operators
|
|
131
|
+
|
|
132
|
+
Combine conditions:
|
|
133
|
+
|
|
134
|
+
[source,ruby]
|
|
135
|
+
----
|
|
136
|
+
xml = <<~XML
|
|
137
|
+
<library>
|
|
138
|
+
<book id="1" category="programming" price="29.99">
|
|
139
|
+
<title>Ruby Basics</title>
|
|
140
|
+
</book>
|
|
141
|
+
<book id="2" category="fiction" price="19.99">
|
|
142
|
+
<title>Ruby Story</title>
|
|
143
|
+
</book>
|
|
144
|
+
<book id="3" category="programming" price="39.99">
|
|
145
|
+
<title>Advanced Ruby</title>
|
|
146
|
+
</book>
|
|
147
|
+
</library>
|
|
148
|
+
XML
|
|
149
|
+
|
|
150
|
+
doc = Moxml.new.parse(xml)
|
|
151
|
+
|
|
152
|
+
# AND condition
|
|
153
|
+
cheap_programming = doc.xpath('//book[@category="programming" and @price < 35]')
|
|
154
|
+
puts cheap_programming.length # => 1
|
|
155
|
+
|
|
156
|
+
# OR condition
|
|
157
|
+
fiction_or_cheap = doc.xpath('//book[@category="fiction" or @price < 25]')
|
|
158
|
+
puts fiction_or_cheap.length # => 2
|
|
159
|
+
|
|
160
|
+
# Complex conditions
|
|
161
|
+
results = doc.xpath('//book[(@category="programming" and @price < 40) or @id="2"]')
|
|
162
|
+
puts results.length # => 2
|
|
163
|
+
----
|
|
164
|
+
|
|
165
|
+
=== Step 6: XPath functions
|
|
166
|
+
|
|
167
|
+
Use built-in XPath functions:
|
|
168
|
+
|
|
169
|
+
[source,ruby]
|
|
170
|
+
----
|
|
171
|
+
# Count function
|
|
172
|
+
book_count = doc.xpath('count(//book)')
|
|
173
|
+
puts book_count # => 3
|
|
174
|
+
|
|
175
|
+
# String functions
|
|
176
|
+
titles = doc.xpath('//title[string-length(text()) > 10]')
|
|
177
|
+
|
|
178
|
+
# Concat function
|
|
179
|
+
full_title = doc.xpath('concat(//book[1]/title, " - Edition 2")')
|
|
180
|
+
|
|
181
|
+
# Position functions
|
|
182
|
+
middle_books = doc.xpath('//book[position() > 1 and position() < last()]')
|
|
183
|
+
----
|
|
184
|
+
|
|
185
|
+
=== Step 7: Namespace-aware queries
|
|
186
|
+
|
|
187
|
+
Query XML with namespaces:
|
|
188
|
+
|
|
189
|
+
[source,ruby]
|
|
190
|
+
----
|
|
191
|
+
xml = <<~XML
|
|
192
|
+
<library xmlns="http://example.org/library"
|
|
193
|
+
xmlns:dc="http://purl.org/dc/elements/1.1/">
|
|
194
|
+
<book>
|
|
195
|
+
<dc:title>Ruby Programming</dc:title>
|
|
196
|
+
<dc:creator>Jane Smith</dc:creator>
|
|
197
|
+
</book>
|
|
198
|
+
</library>
|
|
199
|
+
XML
|
|
200
|
+
|
|
201
|
+
doc = Moxml.new.parse(xml)
|
|
202
|
+
|
|
203
|
+
# Define namespace mappings
|
|
204
|
+
namespaces = {
|
|
205
|
+
'lib' => 'http://example.org/library',
|
|
206
|
+
'dc' => 'http://purl.org/dc/elements/1.1/'
|
|
207
|
+
}
|
|
208
|
+
|
|
209
|
+
# Query with namespace prefixes
|
|
210
|
+
books = doc.xpath('//lib:book', namespaces)
|
|
211
|
+
titles = doc.xpath('//dc:title', namespaces)
|
|
212
|
+
creators = doc.xpath('//dc:creator', namespaces)
|
|
213
|
+
|
|
214
|
+
puts titles.first.text # => "Ruby Programming"
|
|
215
|
+
puts creators.first.text # => "Jane Smith"
|
|
216
|
+
|
|
217
|
+
# Complex namespace queries
|
|
218
|
+
all_dc_elements = doc.xpath('//dc:*', namespaces)
|
|
219
|
+
puts all_dc_elements.length # => 2 (title + creator)
|
|
220
|
+
----
|
|
221
|
+
|
|
222
|
+
=== Step 8: Axes and advanced selection
|
|
223
|
+
|
|
224
|
+
Use XPath axes for complex traversal:
|
|
225
|
+
|
|
226
|
+
[source,ruby]
|
|
227
|
+
----
|
|
228
|
+
xml = <<~XML
|
|
229
|
+
<book>
|
|
230
|
+
<chapter id="1">Introduction</chapter>
|
|
231
|
+
<chapter id="2">Basics</chapter>
|
|
232
|
+
<chapter id="3">Advanced</chapter>
|
|
233
|
+
</book>
|
|
234
|
+
XML
|
|
235
|
+
|
|
236
|
+
doc = Moxml.new.parse(xml)
|
|
237
|
+
|
|
238
|
+
# Following sibling
|
|
239
|
+
chapter1 = doc.at_xpath('//chapter[@id="1"]')
|
|
240
|
+
next_chapters = chapter1.xpath('following-sibling::chapter')
|
|
241
|
+
puts next_chapters.length # => 2
|
|
242
|
+
|
|
243
|
+
# Preceding sibling
|
|
244
|
+
chapter3 = doc.at_xpath('//chapter[@id="3"]')
|
|
245
|
+
prev_chapters = chapter3.xpath('preceding-sibling::chapter')
|
|
246
|
+
puts prev_chapters.length # => 2
|
|
247
|
+
|
|
248
|
+
# Ancestor
|
|
249
|
+
chapter = doc.at_xpath('//chapter[@id="2"]')
|
|
250
|
+
book = chapter.xpath('ancestor::book').first
|
|
251
|
+
puts book.name # => "book"
|
|
252
|
+
|
|
253
|
+
# Descendant
|
|
254
|
+
all_text = doc.root.xpath('descendant::text()')
|
|
255
|
+
----
|
|
256
|
+
|
|
257
|
+
=== Adapter-specific considerations
|
|
258
|
+
|
|
259
|
+
==== Nokogiri, LibXML, Oga
|
|
260
|
+
|
|
261
|
+
Full XPath 1.0 support - all examples above work perfectly.
|
|
262
|
+
|
|
263
|
+
==== REXML
|
|
264
|
+
|
|
265
|
+
Limited support - namespace queries don't work:
|
|
266
|
+
|
|
267
|
+
[source,ruby]
|
|
268
|
+
----
|
|
269
|
+
# Works
|
|
270
|
+
doc.xpath('//book')
|
|
271
|
+
doc.xpath('//book[@id="1"]')
|
|
272
|
+
|
|
273
|
+
# Does NOT work
|
|
274
|
+
doc.xpath('//ns:book', namespaces) # ❌
|
|
275
|
+
|
|
276
|
+
# Workaround
|
|
277
|
+
books = doc.xpath('//book')
|
|
278
|
+
books.select { |b| b.namespace == 'http://example.org' }
|
|
279
|
+
----
|
|
280
|
+
|
|
281
|
+
==== Ox
|
|
282
|
+
|
|
283
|
+
Basic paths only - use Ruby for complex filtering:
|
|
284
|
+
|
|
285
|
+
[source,ruby]
|
|
286
|
+
----
|
|
287
|
+
# Works
|
|
288
|
+
all_books = doc.xpath('//book')
|
|
289
|
+
|
|
290
|
+
# Does NOT work - use Ruby instead
|
|
291
|
+
# doc.xpath('//book[@id="1"]') # ❌
|
|
292
|
+
|
|
293
|
+
# Workaround
|
|
294
|
+
books = doc.xpath('//book')
|
|
295
|
+
book1 = books.find { |b| b['id'] == '1' }
|
|
296
|
+
----
|
|
297
|
+
|
|
298
|
+
=== Best practices
|
|
299
|
+
|
|
300
|
+
. **Use specific paths** when possible for better performance
|
|
301
|
+
. **Cache XPath results** if querying multiple times
|
|
302
|
+
. **Choose the right adapter** for your XPath needs
|
|
303
|
+
. **Test namespace queries** with your chosen adapter
|
|
304
|
+
. **Use relative paths** (`.//`) when querying from elements
|
|
305
|
+
|
|
306
|
+
=== Common patterns
|
|
307
|
+
|
|
308
|
+
==== Extract structured data
|
|
309
|
+
|
|
310
|
+
[source,ruby]
|
|
311
|
+
----
|
|
312
|
+
products = doc.xpath('//product').map do |product|
|
|
313
|
+
{
|
|
314
|
+
id: product['id'],
|
|
315
|
+
name: product.at_xpath('.//name').text,
|
|
316
|
+
price: product.at_xpath('.//price').text.to_f,
|
|
317
|
+
stock: product.at_xpath('.//stock').text.to_i
|
|
318
|
+
}
|
|
319
|
+
end
|
|
320
|
+
----
|
|
321
|
+
|
|
322
|
+
==== Conditional processing
|
|
323
|
+
|
|
324
|
+
[source,ruby]
|
|
325
|
+
----
|
|
326
|
+
# Find and process matching elements
|
|
327
|
+
expensive_books = doc.xpath('//book[price > 30]')
|
|
328
|
+
expensive_books.each do |book|
|
|
329
|
+
# Apply discount
|
|
330
|
+
price = book.at_xpath('.//price')
|
|
331
|
+
current = price.text.to_f
|
|
332
|
+
price.text = (current * 0.9).to_s
|
|
333
|
+
end
|
|
334
|
+
----
|
|
335
|
+
|
|
336
|
+
==== Validation
|
|
337
|
+
|
|
338
|
+
[source,ruby]
|
|
339
|
+
----
|
|
340
|
+
# Check required elements exist
|
|
341
|
+
required = ['title', 'author', 'isbn']
|
|
342
|
+
missing = required.reject { |elem| doc.at_xpath("//#{elem}") }
|
|
343
|
+
|
|
344
|
+
if missing.any?
|
|
345
|
+
raise "Missing elements: #{missing.join(', ')}"
|
|
346
|
+
end
|
|
347
|
+
----
|
|
348
|
+
|
|
349
|
+
=== Next steps
|
|
350
|
+
|
|
351
|
+
* link:namespace-handling[Namespace handling] - Master XML namespaces
|
|
352
|
+
* link:working-with-elements[Working with elements] - Element manipulation
|
|
353
|
+
* link:../guides/xpath-queries[XPath queries guide] - Advanced patterns
|
|
354
|
+
|
|
355
|
+
=== See also
|
|
356
|
+
|
|
357
|
+
* link:../pages/adapters/[Adapters] - XPath capabilities per adapter
|
|
358
|
+
* link:../references/xpath-api[XPath API reference]
|
|
359
|
+
* link:../pages/compatibility[Compatibility matrix] - XPath support comparison
|
data/docs/index.adoc
ADDED
|
@@ -0,0 +1,122 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Home
|
|
3
|
+
nav_order: 1
|
|
4
|
+
---
|
|
5
|
+
:page-layout: home
|
|
6
|
+
|
|
7
|
+
== Moxml: Modern XML processing for Ruby
|
|
8
|
+
|
|
9
|
+
image:https://github.com/lutaml/moxml/workflows/rake/badge.svg["Build Status", link="https://github.com/lutaml/moxml/actions?workflow=rake"]
|
|
10
|
+
image:https://img.shields.io/gem/v/moxml.svg["Gem Version", link="https://rubygems.org/gems/moxml"]
|
|
11
|
+
|
|
12
|
+
Moxml provides a unified, modern XML processing interface for Ruby
|
|
13
|
+
applications. It offers a consistent API that abstracts away the underlying
|
|
14
|
+
XML implementation details while maintaining high performance through
|
|
15
|
+
efficient node mapping and native XPath querying.
|
|
16
|
+
|
|
17
|
+
== Key features
|
|
18
|
+
|
|
19
|
+
* **Unified interface** - Consistent API across different XML libraries
|
|
20
|
+
* **Multiple adapters** - Support for Nokogiri, LibXML, Oga, REXML, Ox, and HeadedOx
|
|
21
|
+
* **Full XPath support** - Native XPath querying with namespace handling
|
|
22
|
+
* **SAX parsing** - Memory-efficient event-driven XML processing
|
|
23
|
+
* **Ruby-idiomatic** - Natural, expressive API for XML manipulation
|
|
24
|
+
* **High performance** - Efficient node mapping and adapter optimization
|
|
25
|
+
* **Thread-safe** - Safe for concurrent operations
|
|
26
|
+
* **Complete node support** - Handle all XML node types
|
|
27
|
+
|
|
28
|
+
== Quick example
|
|
29
|
+
|
|
30
|
+
[source,ruby]
|
|
31
|
+
----
|
|
32
|
+
require 'moxml'
|
|
33
|
+
|
|
34
|
+
# Parse XML
|
|
35
|
+
doc = Moxml.new.parse('<library><book id="1">Ruby</book></library>')
|
|
36
|
+
|
|
37
|
+
# Query with XPath
|
|
38
|
+
books = doc.xpath('//book[@id="1"]')
|
|
39
|
+
puts books.first.text # => "Ruby"
|
|
40
|
+
|
|
41
|
+
# Modify content
|
|
42
|
+
books.first.text = 'Advanced Ruby'
|
|
43
|
+
books.first['edition'] = '2nd'
|
|
44
|
+
|
|
45
|
+
# Output formatted XML
|
|
46
|
+
puts doc.to_xml(indent: 2)
|
|
47
|
+
----
|
|
48
|
+
|
|
49
|
+
== Getting started
|
|
50
|
+
|
|
51
|
+
Choose your path:
|
|
52
|
+
|
|
53
|
+
[.grid]
|
|
54
|
+
--
|
|
55
|
+
link:pages/installation[**Installation**]::
|
|
56
|
+
Install Moxml and choose an XML adapter
|
|
57
|
+
|
|
58
|
+
link:tutorials/[**Tutorials**]::
|
|
59
|
+
Step-by-step guides to learn Moxml
|
|
60
|
+
|
|
61
|
+
link:guides/[**Guides**]::
|
|
62
|
+
Task-oriented documentation for common operations
|
|
63
|
+
|
|
64
|
+
link:pages/adapters/[**Adapters**]::
|
|
65
|
+
Learn about supported XML libraries
|
|
66
|
+
--
|
|
67
|
+
|
|
68
|
+
== Supported adapters
|
|
69
|
+
|
|
70
|
+
[cols="2,3,1"]
|
|
71
|
+
|===
|
|
72
|
+
| Adapter | Best for | Performance
|
|
73
|
+
|
|
74
|
+
| link:pages/adapters/nokogiri[Nokogiri]
|
|
75
|
+
| Industry standard, wide compatibility
|
|
76
|
+
| Fast
|
|
77
|
+
|
|
78
|
+
| link:pages/adapters/libxml[LibXML]
|
|
79
|
+
| Alternative to Nokogiri, full features
|
|
80
|
+
| Very Fast
|
|
81
|
+
|
|
82
|
+
| link:pages/adapters/oga[Oga]
|
|
83
|
+
| Pure Ruby, no C extensions
|
|
84
|
+
| Fast
|
|
85
|
+
|
|
86
|
+
| link:pages/adapters/rexml[REXML]
|
|
87
|
+
| Standard library, maximum portability
|
|
88
|
+
| Medium
|
|
89
|
+
|
|
90
|
+
| link:pages/adapters/ox[Ox]
|
|
91
|
+
| Maximum speed, simple documents
|
|
92
|
+
| Very Fast
|
|
93
|
+
|
|
94
|
+
| link:pages/adapters/headed-ox[HeadedOx]
|
|
95
|
+
| Fast parsing + comprehensive XPath 1.0 (99.20% pass rate)
|
|
96
|
+
| Very Fast
|
|
97
|
+
|===
|
|
98
|
+
|
|
99
|
+
== Why Moxml?
|
|
100
|
+
|
|
101
|
+
**Consistent API**:: Write once, run with any XML library. Switch adapters
|
|
102
|
+
without changing your code.
|
|
103
|
+
|
|
104
|
+
**Full XPath support**:: Powerful querying with namespace awareness across all
|
|
105
|
+
adapters.
|
|
106
|
+
|
|
107
|
+
**Modern design**:: Clean, Ruby-idiomatic interface that feels natural to use.
|
|
108
|
+
|
|
109
|
+
**Production ready**:: Comprehensive error handling, thread safety, and
|
|
110
|
+
performance optimization.
|
|
111
|
+
|
|
112
|
+
== Community
|
|
113
|
+
|
|
114
|
+
* link:https://github.com/lutaml/moxml[GitHub repository]
|
|
115
|
+
* link:https://github.com/lutaml/moxml/issues[Issue tracker]
|
|
116
|
+
* link:https://rubygems.org/gems/moxml[RubyGems page]
|
|
117
|
+
|
|
118
|
+
== License
|
|
119
|
+
|
|
120
|
+
Moxml is released under the Ribose 3-Clause BSD License.
|
|
121
|
+
|
|
122
|
+
Copyright (c) 2025 Ribose Inc.
|
data/examples/README.md
ADDED
|
@@ -0,0 +1,124 @@
|
|
|
1
|
+
# Moxml Real-World Examples
|
|
2
|
+
|
|
3
|
+
This directory contains practical, runnable examples demonstrating moxml usage in common real-world scenarios.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
Each example demonstrates different aspects of moxml's capabilities:
|
|
8
|
+
|
|
9
|
+
- **RSS Parser**: Parse RSS/Atom feeds with XPath queries and namespace handling
|
|
10
|
+
- **Web Scraper**: Extract data from HTML/XML using DOM navigation
|
|
11
|
+
- **API Client**: Build and parse XML API requests/responses
|
|
12
|
+
|
|
13
|
+
## Requirements
|
|
14
|
+
|
|
15
|
+
All examples require moxml and at least one XML adapter:
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
gem install moxml nokogiri
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
## Running the Examples
|
|
22
|
+
|
|
23
|
+
Each example is self-contained and can be run directly:
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
# RSS Parser Example
|
|
27
|
+
ruby examples/rss_parser/rss_parser.rb
|
|
28
|
+
|
|
29
|
+
# Web Scraper Example
|
|
30
|
+
ruby examples/web_scraper/web_scraper.rb
|
|
31
|
+
|
|
32
|
+
# API Client Example
|
|
33
|
+
ruby examples/api_client/api_client.rb
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Example Details
|
|
37
|
+
|
|
38
|
+
### RSS Parser (`rss_parser/`)
|
|
39
|
+
|
|
40
|
+
Demonstrates:
|
|
41
|
+
- Parsing RSS/Atom feed XML
|
|
42
|
+
- XPath queries for data extraction
|
|
43
|
+
- Namespace handling
|
|
44
|
+
- Element traversal
|
|
45
|
+
- Attribute access
|
|
46
|
+
|
|
47
|
+
**Files:**
|
|
48
|
+
- `rss_parser.rb` - Main parser implementation
|
|
49
|
+
- `example_feed.xml` - Sample RSS feed
|
|
50
|
+
- `README.md` - Detailed documentation
|
|
51
|
+
|
|
52
|
+
### Web Scraper (`web_scraper/`)
|
|
53
|
+
|
|
54
|
+
Demonstrates:
|
|
55
|
+
- HTML/XML document parsing
|
|
56
|
+
- Table data extraction
|
|
57
|
+
- DOM structure navigation
|
|
58
|
+
- Attribute and text content access
|
|
59
|
+
- Error handling
|
|
60
|
+
|
|
61
|
+
**Files:**
|
|
62
|
+
- `web_scraper.rb` - Main scraper implementation
|
|
63
|
+
- `example_page.html` - Sample HTML page
|
|
64
|
+
- `README.md` - Detailed documentation
|
|
65
|
+
|
|
66
|
+
### API Client (`api_client/`)
|
|
67
|
+
|
|
68
|
+
Demonstrates:
|
|
69
|
+
- Building XML API requests
|
|
70
|
+
- Parsing XML API responses
|
|
71
|
+
- SOAP message handling
|
|
72
|
+
- Authentication elements
|
|
73
|
+
- Error response processing
|
|
74
|
+
|
|
75
|
+
**Files:**
|
|
76
|
+
- `api_client.rb` - Main client implementation
|
|
77
|
+
- `example_response.xml` - Sample API response
|
|
78
|
+
- `README.md` - Detailed documentation
|
|
79
|
+
|
|
80
|
+
## Key Concepts
|
|
81
|
+
|
|
82
|
+
### Using require_relative
|
|
83
|
+
|
|
84
|
+
All examples use `require_relative` to load moxml from the local source:
|
|
85
|
+
|
|
86
|
+
```ruby
|
|
87
|
+
require_relative '../../lib/moxml'
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
This allows running the examples directly from the repository without installing the gem.
|
|
91
|
+
|
|
92
|
+
### Error Handling
|
|
93
|
+
|
|
94
|
+
Each example includes comprehensive error handling:
|
|
95
|
+
|
|
96
|
+
```ruby
|
|
97
|
+
begin
|
|
98
|
+
doc = Moxml.new.parse(xml)
|
|
99
|
+
rescue Moxml::ParseError => e
|
|
100
|
+
puts "Parse error: #{e.message}"
|
|
101
|
+
exit 1
|
|
102
|
+
end
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
### Best Practices
|
|
106
|
+
|
|
107
|
+
The examples demonstrate moxml best practices:
|
|
108
|
+
- Proper namespace handling
|
|
109
|
+
- Efficient XPath queries
|
|
110
|
+
- Clean resource management
|
|
111
|
+
- Comprehensive error handling
|
|
112
|
+
- Clear, commented code
|
|
113
|
+
|
|
114
|
+
## Learning Path
|
|
115
|
+
|
|
116
|
+
1. **Start with RSS Parser** - Learn basic parsing and XPath
|
|
117
|
+
2. **Move to Web Scraper** - Understand DOM navigation
|
|
118
|
+
3. **Finish with API Client** - Master XML generation and complex structures
|
|
119
|
+
|
|
120
|
+
## Additional Resources
|
|
121
|
+
|
|
122
|
+
- [Main README](../README.adoc) - Complete moxml documentation
|
|
123
|
+
- [API Reference](../docs/) - Detailed API documentation
|
|
124
|
+
- [Guides](../docs/_guides/) - Step-by-step tutorials
|