rubyjedi-oga 1.0.3

Sign up to get free protection for your applications and to get access to all the features.
Files changed (58) hide show
  1. checksums.yaml +7 -0
  2. data/.yardopts +13 -0
  3. data/LICENSE +362 -0
  4. data/README.md +317 -0
  5. data/doc/css/common.css +77 -0
  6. data/doc/css_selectors.md +935 -0
  7. data/doc/manually_creating_documents.md +67 -0
  8. data/doc/migrating_from_nokogiri.md +169 -0
  9. data/doc/xml_namespaces.md +63 -0
  10. data/ext/c/extconf.rb +11 -0
  11. data/ext/c/lexer.c +2595 -0
  12. data/ext/c/lexer.h +16 -0
  13. data/ext/c/lexer.rl +198 -0
  14. data/ext/c/liboga.c +6 -0
  15. data/ext/c/liboga.h +11 -0
  16. data/ext/java/Liboga.java +14 -0
  17. data/ext/java/org/liboga/xml/Lexer.java +1363 -0
  18. data/ext/java/org/liboga/xml/Lexer.rl +223 -0
  19. data/ext/ragel/base_lexer.rl +633 -0
  20. data/lib/oga.rb +57 -0
  21. data/lib/oga/blacklist.rb +40 -0
  22. data/lib/oga/css/lexer.rb +743 -0
  23. data/lib/oga/css/parser.rb +976 -0
  24. data/lib/oga/entity_decoder.rb +21 -0
  25. data/lib/oga/html/entities.rb +2150 -0
  26. data/lib/oga/html/parser.rb +25 -0
  27. data/lib/oga/html/sax_parser.rb +18 -0
  28. data/lib/oga/lru.rb +160 -0
  29. data/lib/oga/oga.rb +57 -0
  30. data/lib/oga/version.rb +3 -0
  31. data/lib/oga/whitelist.rb +20 -0
  32. data/lib/oga/xml/attribute.rb +136 -0
  33. data/lib/oga/xml/cdata.rb +17 -0
  34. data/lib/oga/xml/character_node.rb +37 -0
  35. data/lib/oga/xml/comment.rb +17 -0
  36. data/lib/oga/xml/default_namespace.rb +13 -0
  37. data/lib/oga/xml/doctype.rb +82 -0
  38. data/lib/oga/xml/document.rb +108 -0
  39. data/lib/oga/xml/element.rb +428 -0
  40. data/lib/oga/xml/entities.rb +122 -0
  41. data/lib/oga/xml/html_void_elements.rb +15 -0
  42. data/lib/oga/xml/lexer.rb +550 -0
  43. data/lib/oga/xml/namespace.rb +48 -0
  44. data/lib/oga/xml/node.rb +219 -0
  45. data/lib/oga/xml/node_set.rb +333 -0
  46. data/lib/oga/xml/parser.rb +631 -0
  47. data/lib/oga/xml/processing_instruction.rb +37 -0
  48. data/lib/oga/xml/pull_parser.rb +175 -0
  49. data/lib/oga/xml/querying.rb +56 -0
  50. data/lib/oga/xml/sax_parser.rb +192 -0
  51. data/lib/oga/xml/text.rb +66 -0
  52. data/lib/oga/xml/traversal.rb +50 -0
  53. data/lib/oga/xml/xml_declaration.rb +65 -0
  54. data/lib/oga/xpath/evaluator.rb +1798 -0
  55. data/lib/oga/xpath/lexer.rb +1958 -0
  56. data/lib/oga/xpath/parser.rb +622 -0
  57. data/oga.gemspec +45 -0
  58. metadata +227 -0
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d5ee55c04377dd30ae94fbe33556d4d535f27cc6
4
+ data.tar.gz: 82522f8cb52c9511e16930b60e9e7e3eb12aa0e0
5
+ SHA512:
6
+ metadata.gz: a8d082defeb61a5e2338a8e772694ba46e266bfb43def0aadf7ca73c7806385d0e1dd47d5e244fdecae6b7b29d9f7c5dfb1d6a65af6315a0a2120d5d86da6328
7
+ data.tar.gz: 2c9302cfb0bff98375b7b4ca946ce47bda444628bc53a5b5cb7c9832ea39b9e6224a27d8be22f6aca4a9b85743f8b03048e8b454cd46e8428a75b08a52ea6326
@@ -0,0 +1,13 @@
1
+ ./lib/oga/**/*.rb ./lib/oga.rb
2
+ -m markdown
3
+ -M kramdown
4
+ -o yardoc
5
+ -r ./README.md
6
+ --protected
7
+ --asset ./doc/css/common.css:css/common.css
8
+ --verbose
9
+ -
10
+ ./doc/*.md
11
+ LICENSE
12
+ CONTRIBUTING.md
13
+ CHANGELOG.md
data/LICENSE ADDED
@@ -0,0 +1,362 @@
1
+ Mozilla Public License, version 2.0
2
+
3
+ 1. Definitions
4
+
5
+ 1.1. "Contributor"
6
+
7
+ means each individual or legal entity that creates, contributes to the
8
+ creation of, or owns Covered Software.
9
+
10
+ 1.2. "Contributor Version"
11
+
12
+ means the combination of the Contributions of others (if any) used by a
13
+ Contributor and that particular Contributor's Contribution.
14
+
15
+ 1.3. "Contribution"
16
+
17
+ means Covered Software of a particular Contributor.
18
+
19
+ 1.4. "Covered Software"
20
+
21
+ means Source Code Form to which the initial Contributor has attached the
22
+ notice in Exhibit A, the Executable Form of such Source Code Form, and
23
+ Modifications of such Source Code Form, in each case including portions
24
+ thereof.
25
+
26
+ 1.5. "Incompatible With Secondary Licenses"
27
+ means
28
+
29
+ a. that the initial Contributor has attached the notice described in
30
+ Exhibit B to the Covered Software; or
31
+
32
+ b. that the Covered Software was made available under the terms of
33
+ version 1.1 or earlier of the License, but not also under the terms of
34
+ a Secondary License.
35
+
36
+ 1.6. "Executable Form"
37
+
38
+ means any form of the work other than Source Code Form.
39
+
40
+ 1.7. "Larger Work"
41
+
42
+ means a work that combines Covered Software with other material, in a
43
+ separate file or files, that is not Covered Software.
44
+
45
+ 1.8. "License"
46
+
47
+ means this document.
48
+
49
+ 1.9. "Licensable"
50
+
51
+ means having the right to grant, to the maximum extent possible, whether
52
+ at the time of the initial grant or subsequently, any and all of the
53
+ rights conveyed by this License.
54
+
55
+ 1.10. "Modifications"
56
+
57
+ means any of the following:
58
+
59
+ a. any file in Source Code Form that results from an addition to,
60
+ deletion from, or modification of the contents of Covered Software; or
61
+
62
+ b. any new file in Source Code Form that contains any Covered Software.
63
+
64
+ 1.11. "Patent Claims" of a Contributor
65
+
66
+ means any patent claim(s), including without limitation, method,
67
+ process, and apparatus claims, in any patent Licensable by such
68
+ Contributor that would be infringed, but for the grant of the License,
69
+ by the making, using, selling, offering for sale, having made, import,
70
+ or transfer of either its Contributions or its Contributor Version.
71
+
72
+ 1.12. "Secondary License"
73
+
74
+ means either the GNU General Public License, Version 2.0, the GNU Lesser
75
+ General Public License, Version 2.1, the GNU Affero General Public
76
+ License, Version 3.0, or any later versions of those licenses.
77
+
78
+ 1.13. "Source Code Form"
79
+
80
+ means the form of the work preferred for making modifications.
81
+
82
+ 1.14. "You" (or "Your")
83
+
84
+ means an individual or a legal entity exercising rights under this
85
+ License. For legal entities, "You" includes any entity that controls, is
86
+ controlled by, or is under common control with You. For purposes of this
87
+ definition, "control" means (a) the power, direct or indirect, to cause
88
+ the direction or management of such entity, whether by contract or
89
+ otherwise, or (b) ownership of more than fifty percent (50%) of the
90
+ outstanding shares or beneficial ownership of such entity.
91
+
92
+
93
+ 2. License Grants and Conditions
94
+
95
+ 2.1. Grants
96
+
97
+ Each Contributor hereby grants You a world-wide, royalty-free,
98
+ non-exclusive license:
99
+
100
+ a. under intellectual property rights (other than patent or trademark)
101
+ Licensable by such Contributor to use, reproduce, make available,
102
+ modify, display, perform, distribute, and otherwise exploit its
103
+ Contributions, either on an unmodified basis, with Modifications, or
104
+ as part of a Larger Work; and
105
+
106
+ b. under Patent Claims of such Contributor to make, use, sell, offer for
107
+ sale, have made, import, and otherwise transfer either its
108
+ Contributions or its Contributor Version.
109
+
110
+ 2.2. Effective Date
111
+
112
+ The licenses granted in Section 2.1 with respect to any Contribution
113
+ become effective for each Contribution on the date the Contributor first
114
+ distributes such Contribution.
115
+
116
+ 2.3. Limitations on Grant Scope
117
+
118
+ The licenses granted in this Section 2 are the only rights granted under
119
+ this License. No additional rights or licenses will be implied from the
120
+ distribution or licensing of Covered Software under this License.
121
+ Notwithstanding Section 2.1(b) above, no patent license is granted by a
122
+ Contributor:
123
+
124
+ a. for any code that a Contributor has removed from Covered Software; or
125
+
126
+ b. for infringements caused by: (i) Your and any other third party's
127
+ modifications of Covered Software, or (ii) the combination of its
128
+ Contributions with other software (except as part of its Contributor
129
+ Version); or
130
+
131
+ c. under Patent Claims infringed by Covered Software in the absence of
132
+ its Contributions.
133
+
134
+ This License does not grant any rights in the trademarks, service marks,
135
+ or logos of any Contributor (except as may be necessary to comply with
136
+ the notice requirements in Section 3.4).
137
+
138
+ 2.4. Subsequent Licenses
139
+
140
+ No Contributor makes additional grants as a result of Your choice to
141
+ distribute the Covered Software under a subsequent version of this
142
+ License (see Section 10.2) or under the terms of a Secondary License (if
143
+ permitted under the terms of Section 3.3).
144
+
145
+ 2.5. Representation
146
+
147
+ Each Contributor represents that the Contributor believes its
148
+ Contributions are its original creation(s) or it has sufficient rights to
149
+ grant the rights to its Contributions conveyed by this License.
150
+
151
+ 2.6. Fair Use
152
+
153
+ This License is not intended to limit any rights You have under
154
+ applicable copyright doctrines of fair use, fair dealing, or other
155
+ equivalents.
156
+
157
+ 2.7. Conditions
158
+
159
+ Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted in
160
+ Section 2.1.
161
+
162
+
163
+ 3. Responsibilities
164
+
165
+ 3.1. Distribution of Source Form
166
+
167
+ All distribution of Covered Software in Source Code Form, including any
168
+ Modifications that You create or to which You contribute, must be under
169
+ the terms of this License. You must inform recipients that the Source
170
+ Code Form of the Covered Software is governed by the terms of this
171
+ License, and how they can obtain a copy of this License. You may not
172
+ attempt to alter or restrict the recipients' rights in the Source Code
173
+ Form.
174
+
175
+ 3.2. Distribution of Executable Form
176
+
177
+ If You distribute Covered Software in Executable Form then:
178
+
179
+ a. such Covered Software must also be made available in Source Code Form,
180
+ as described in Section 3.1, and You must inform recipients of the
181
+ Executable Form how they can obtain a copy of such Source Code Form by
182
+ reasonable means in a timely manner, at a charge no more than the cost
183
+ of distribution to the recipient; and
184
+
185
+ b. You may distribute such Executable Form under the terms of this
186
+ License, or sublicense it under different terms, provided that the
187
+ license for the Executable Form does not attempt to limit or alter the
188
+ recipients' rights in the Source Code Form under this License.
189
+
190
+ 3.3. Distribution of a Larger Work
191
+
192
+ You may create and distribute a Larger Work under terms of Your choice,
193
+ provided that You also comply with the requirements of this License for
194
+ the Covered Software. If the Larger Work is a combination of Covered
195
+ Software with a work governed by one or more Secondary Licenses, and the
196
+ Covered Software is not Incompatible With Secondary Licenses, this
197
+ License permits You to additionally distribute such Covered Software
198
+ under the terms of such Secondary License(s), so that the recipient of
199
+ the Larger Work may, at their option, further distribute the Covered
200
+ Software under the terms of either this License or such Secondary
201
+ License(s).
202
+
203
+ 3.4. Notices
204
+
205
+ You may not remove or alter the substance of any license notices
206
+ (including copyright notices, patent notices, disclaimers of warranty, or
207
+ limitations of liability) contained within the Source Code Form of the
208
+ Covered Software, except that You may alter any license notices to the
209
+ extent required to remedy known factual inaccuracies.
210
+
211
+ 3.5. Application of Additional Terms
212
+
213
+ You may choose to offer, and to charge a fee for, warranty, support,
214
+ indemnity or liability obligations to one or more recipients of Covered
215
+ Software. However, You may do so only on Your own behalf, and not on
216
+ behalf of any Contributor. You must make it absolutely clear that any
217
+ such warranty, support, indemnity, or liability obligation is offered by
218
+ You alone, and You hereby agree to indemnify every Contributor for any
219
+ liability incurred by such Contributor as a result of warranty, support,
220
+ indemnity or liability terms You offer. You may include additional
221
+ disclaimers of warranty and limitations of liability specific to any
222
+ jurisdiction.
223
+
224
+ 4. Inability to Comply Due to Statute or Regulation
225
+
226
+ If it is impossible for You to comply with any of the terms of this License
227
+ with respect to some or all of the Covered Software due to statute,
228
+ judicial order, or regulation then You must: (a) comply with the terms of
229
+ this License to the maximum extent possible; and (b) describe the
230
+ limitations and the code they affect. Such description must be placed in a
231
+ text file included with all distributions of the Covered Software under
232
+ this License. Except to the extent prohibited by statute or regulation,
233
+ such description must be sufficiently detailed for a recipient of ordinary
234
+ skill to be able to understand it.
235
+
236
+ 5. Termination
237
+
238
+ 5.1. The rights granted under this License will terminate automatically if You
239
+ fail to comply with any of its terms. However, if You become compliant,
240
+ then the rights granted under this License from a particular Contributor
241
+ are reinstated (a) provisionally, unless and until such Contributor
242
+ explicitly and finally terminates Your grants, and (b) on an ongoing
243
+ basis, if such Contributor fails to notify You of the non-compliance by
244
+ some reasonable means prior to 60 days after You have come back into
245
+ compliance. Moreover, Your grants from a particular Contributor are
246
+ reinstated on an ongoing basis if such Contributor notifies You of the
247
+ non-compliance by some reasonable means, this is the first time You have
248
+ received notice of non-compliance with this License from such
249
+ Contributor, and You become compliant prior to 30 days after Your receipt
250
+ of the notice.
251
+
252
+ 5.2. If You initiate litigation against any entity by asserting a patent
253
+ infringement claim (excluding declaratory judgment actions,
254
+ counter-claims, and cross-claims) alleging that a Contributor Version
255
+ directly or indirectly infringes any patent, then the rights granted to
256
+ You by any and all Contributors for the Covered Software under Section
257
+ 2.1 of this License shall terminate.
258
+
259
+ 5.3. In the event of termination under Sections 5.1 or 5.2 above, all end user
260
+ license agreements (excluding distributors and resellers) which have been
261
+ validly granted by You or Your distributors under this License prior to
262
+ termination shall survive termination.
263
+
264
+ 6. Disclaimer of Warranty
265
+
266
+ Covered Software is provided under this License on an "as is" basis,
267
+ without warranty of any kind, either expressed, implied, or statutory,
268
+ including, without limitation, warranties that the Covered Software is free
269
+ of defects, merchantable, fit for a particular purpose or non-infringing.
270
+ The entire risk as to the quality and performance of the Covered Software
271
+ is with You. Should any Covered Software prove defective in any respect,
272
+ You (not any Contributor) assume the cost of any necessary servicing,
273
+ repair, or correction. This disclaimer of warranty constitutes an essential
274
+ part of this License. No use of any Covered Software is authorized under
275
+ this License except under this disclaimer.
276
+
277
+ 7. Limitation of Liability
278
+
279
+ Under no circumstances and under no legal theory, whether tort (including
280
+ negligence), contract, or otherwise, shall any Contributor, or anyone who
281
+ distributes Covered Software as permitted above, be liable to You for any
282
+ direct, indirect, special, incidental, or consequential damages of any
283
+ character including, without limitation, damages for lost profits, loss of
284
+ goodwill, work stoppage, computer failure or malfunction, or any and all
285
+ other commercial damages or losses, even if such party shall have been
286
+ informed of the possibility of such damages. This limitation of liability
287
+ shall not apply to liability for death or personal injury resulting from
288
+ such party's negligence to the extent applicable law prohibits such
289
+ limitation. Some jurisdictions do not allow the exclusion or limitation of
290
+ incidental or consequential damages, so this exclusion and limitation may
291
+ not apply to You.
292
+
293
+ 8. Litigation
294
+
295
+ Any litigation relating to this License may be brought only in the courts
296
+ of a jurisdiction where the defendant maintains its principal place of
297
+ business and such litigation shall be governed by laws of that
298
+ jurisdiction, without reference to its conflict-of-law provisions. Nothing
299
+ in this Section shall prevent a party's ability to bring cross-claims or
300
+ counter-claims.
301
+
302
+ 9. Miscellaneous
303
+
304
+ This License represents the complete agreement concerning the subject
305
+ matter hereof. If any provision of this License is held to be
306
+ unenforceable, such provision shall be reformed only to the extent
307
+ necessary to make it enforceable. Any law or regulation which provides that
308
+ the language of a contract shall be construed against the drafter shall not
309
+ be used to construe this License against a Contributor.
310
+
311
+
312
+ 10. Versions of the License
313
+
314
+ 10.1. New Versions
315
+
316
+ Mozilla Foundation is the license steward. Except as provided in Section
317
+ 10.3, no one other than the license steward has the right to modify or
318
+ publish new versions of this License. Each version will be given a
319
+ distinguishing version number.
320
+
321
+ 10.2. Effect of New Versions
322
+
323
+ You may distribute the Covered Software under the terms of the version
324
+ of the License under which You originally received the Covered Software,
325
+ or under the terms of any subsequent version published by the license
326
+ steward.
327
+
328
+ 10.3. Modified Versions
329
+
330
+ If you create software not governed by this License, and you want to
331
+ create a new license for such software, you may create and use a
332
+ modified version of this License if you rename the license and remove
333
+ any references to the name of the license steward (except to note that
334
+ such modified license differs from this License).
335
+
336
+ 10.4. Distributing Source Code Form that is Incompatible With Secondary
337
+ Licenses If You choose to distribute Source Code Form that is
338
+ Incompatible With Secondary Licenses under the terms of this version of
339
+ the License, the notice described in Exhibit B of this License must be
340
+ attached.
341
+
342
+ Exhibit A - Source Code Form License Notice
343
+
344
+ This Source Code Form is subject to the
345
+ terms of the Mozilla Public License, v.
346
+ 2.0. If a copy of the MPL was not
347
+ distributed with this file, You can
348
+ obtain one at
349
+ http://mozilla.org/MPL/2.0/.
350
+
351
+ If it is not possible or desirable to put the notice in a particular file,
352
+ then You may include the notice in a location (such as a LICENSE file in a
353
+ relevant directory) where a recipient would be likely to look for such a
354
+ notice.
355
+
356
+ You may add additional accurate notices of copyright ownership.
357
+
358
+ Exhibit B - "Incompatible With Secondary Licenses" Notice
359
+
360
+ This Source Code Form is "Incompatible
361
+ With Secondary Licenses", as defined by
362
+ the Mozilla Public License, v. 2.0.
@@ -0,0 +1,317 @@
1
+ # Oga
2
+
3
+ Oga is an XML/HTML parser written in Ruby. It provides an easy to use API for
4
+ parsing, modifying and querying documents (using XPath expressions). Oga does
5
+ not require system libraries such as libxml, making it easier and faster to
6
+ install on various platforms. To achieve better performance Oga uses a small,
7
+ native extension (C for MRI/Rubinius, Java for JRuby).
8
+
9
+ Oga provides an API that allows you to safely parse and query documents in a
10
+ multi-threaded environment, without having to worry about your applications
11
+ blowing up.
12
+
13
+ From [Wikipedia][oga-wikipedia]:
14
+
15
+ > Oga: A large two-person saw used for ripping large boards in the days before
16
+ > power saws. One person stood on a raised platform, with the board below him,
17
+ > and the other person stood underneath them.
18
+
19
+ The name is a pun on [Nokogiri][nokogiri].
20
+
21
+ Oga uses [Semantic Versioning 2.0][semver] as its versioning scheme. All
22
+ classes, modules and methods are part of the public API _unless_ they are
23
+ declared as private using Ruby's `private` keyword or YARD's `@api private` tag.
24
+
25
+ ## Examples
26
+
27
+ Parsing a simple string of XML:
28
+
29
+ Oga.parse_xml('<people><person>Alice</person></people>')
30
+
31
+ Parsing XML using strict mode (disables automatic tag insertion):
32
+
33
+ Oga.parse_xml('<people>foo</people>', :strict => true) # works fine
34
+ Oga.parse_xml('<people>foo', :strict => true) # throws an error
35
+
36
+ Parsing a simple string of HTML:
37
+
38
+ Oga.parse_html('<link rel="stylesheet" href="foo.css">')
39
+
40
+ Parsing an IO handle pointing to XML (this also works when using
41
+ `Oga.parse_html`):
42
+
43
+ handle = File.open('path/to/file.xml')
44
+
45
+ Oga.parse_xml(handle)
46
+
47
+ Parsing an IO handle using the pull parser:
48
+
49
+ handle = File.open('path/to/file.xml')
50
+ parser = Oga::XML::PullParser.new(handle)
51
+
52
+ parser.parse do |node|
53
+ parser.on(:text) do
54
+ puts node.text
55
+ end
56
+ end
57
+
58
+ Using an Enumerator to download and parse an XML document on the fly:
59
+
60
+ enum = Enumerator.new do |yielder|
61
+ HTTPClient.get('http://some-website.com/some-big-file.xml') do |chunk|
62
+ yielder << chunk
63
+ end
64
+ end
65
+
66
+ document = Oga.parse_xml(enum)
67
+
68
+ Parse a string of XML using the SAX parser:
69
+
70
+ class ElementNames
71
+ attr_reader :names
72
+
73
+ def initialize
74
+ @names = []
75
+ end
76
+
77
+ def on_element(namespace, name, attrs = {})
78
+ @names << name
79
+ end
80
+ end
81
+
82
+ handler = ElementNames.new
83
+
84
+ Oga.sax_parse_xml(handler, '<foo><bar></bar></foo>')
85
+
86
+ handler.names # => ["foo", "bar"]
87
+
88
+ Querying a document using XPath:
89
+
90
+ document = Oga.parse_xml <<-EOF
91
+ <people>
92
+ <person id="1">
93
+ <name>Alice</name>
94
+ <age>28</name>
95
+ </person>
96
+ </people>
97
+ EOF
98
+
99
+ # The "xpath" method returns an enumerable (Oga::XML::NodeSet) that you can
100
+ # iterate over.
101
+ document.xpath('people/person').each do |person|
102
+ puts person.get('id') # => "1"
103
+
104
+ # The "at_xpath" method returns a single node from a set, it's the same as
105
+ # person.xpath('name').first.
106
+ puts person.at_xpath('name').text # => "Alice"
107
+ end
108
+
109
+ Querying the same document using CSS:
110
+
111
+ document = Oga.parse_xml <<-EOF
112
+ <people>
113
+ <person id="1">
114
+ <name>Alice</name>
115
+ <age>28</name>
116
+ </person>
117
+ </people>
118
+ EOF
119
+
120
+ # The "css" method returns an enumerable (Oga::XML::NodeSet) that you can
121
+ # iterate over.
122
+ document.css('people person').each do |person|
123
+ puts person.get('id') # => "1"
124
+
125
+ # The "at_css" method returns a single node from a set, it's the same as
126
+ # person.css('name').first.
127
+ puts person.at_css('name').text # => "Alice"
128
+ end
129
+
130
+ Modifying a document and serializing it back to XML:
131
+
132
+ document = Oga.parse_xml('<people><person>Alice</person></people>')
133
+ name = document.at_xpath('people/person[1]/text()')
134
+
135
+ name.text = 'Bob'
136
+
137
+ document.to_xml # => "<people><person>Bob</person></people>"
138
+
139
+ Querying a document using a namespace:
140
+
141
+ document = Oga.parse_xml('<root xmlns:x="foo"><x:div></x:div></root>')
142
+ div = document.xpath('root/x:div').first
143
+
144
+ div.namespace # => Namespace(name: "x" uri: "foo")
145
+
146
+ ## Features
147
+
148
+ * Support for parsing XML and HTML(5)
149
+ * DOM parsing
150
+ * Stream/pull parsing
151
+ * SAX parsing
152
+ * Low memory footprint
153
+ * High performance, if something doesn't perform well enough it's a bug
154
+ * Support for XPath 1.0
155
+ * CSS3 selector support
156
+ * XML namespace support (registering, querying, etc)
157
+
158
+ ## Requirements
159
+
160
+ | Ruby | Required | Recommended |
161
+ |:---------|:--------------|:------------|
162
+ | MRI | >= 1.9.3 | >= 2.1.2 |
163
+ | Rubinius | >= 2.2 | >= 2.2.10 |
164
+ | JRuby | >= 1.7 | >= 1.7.12 |
165
+ | Maglev | Not supported | |
166
+ | Topaz | Not supported | |
167
+ | mruby | Not supported | |
168
+
169
+ Maglev and Topaz are not supported due to the lack of a C API (that I know of)
170
+ and the lack of active development of these Ruby implementations. mruby is not
171
+ supported because it's a very different implementation all together.
172
+
173
+ To install Oga on MRI or Rubinius you'll need to have a working compiler such as
174
+ gcc or clang. Oga's C extension can be compiled with both. JRuby does not
175
+ require a compiler as the native extension is compiled during the Gem building
176
+ process and bundled inside the Gem itself.
177
+
178
+ ## Thread Safety
179
+
180
+ Documents parsed using Oga are thread-safe as long as they are not modified by
181
+ multiple threads at the same time. Querying documents using XPath can be done by
182
+ multiple threads just fine. Write operations, such as removing attributes, are
183
+ _not_ thread-safe and should not be done by multiple threads at once.
184
+
185
+ It is advised that you do not share parsed documents between threads unless you
186
+ _really_ have to.
187
+
188
+ ## Namespace Support
189
+
190
+ Oga fully supports parsing/registering XML namespaces as well as querying them
191
+ using XPath. For example, take the following XML:
192
+
193
+ <root xmlns="http://example.com">
194
+ <bar>bar</bar>
195
+ </root>
196
+
197
+ If one were to try and query the `bar` element (e.g. using XPath `root/bar`)
198
+ they'd end up with an empty node set. This is due to `<root>` defining an
199
+ alternative default namespace. Instead you can query this element using the
200
+ following XPath:
201
+
202
+ *[local-name() = "root"]/*[local-name() = "bar"]
203
+
204
+ Alternatively, if you don't really care where the `<bar>` element is located you
205
+ can use the following:
206
+
207
+ descendant::*[local-name() = "bar"]
208
+
209
+ And if you want to specify an explici namespace URI, you can use this:
210
+
211
+ descendant::*[local-name() = "bar" and namespace-uri() = "http://example.com"]
212
+
213
+ Unlike Nokogiri, Oga does _not_ provide a way to create "dynamic" namespaces.
214
+ That is, Nokogiri allows one to query the above document as following:
215
+
216
+ document = Nokogiri::XML('<root xmlns="http://example.com"><bar>bar</bar></root>')
217
+
218
+ document.xpath('x:root/x:bar', :x => 'http://example.com')
219
+
220
+ Oga does have a small trick you can use to cut down the size of your XPath
221
+ queries. Because Oga assigns the name "xmlns" to default namespaces you can use
222
+ this in your XPath queries:
223
+
224
+ document = Oga.parse_xml('<root xmlns="http://example.com"><bar>bar</bar></root>')
225
+
226
+ document.xpath('xmlns:root/xmlns:bar')
227
+
228
+ When using this you can still restrict the query to the correct namespace URI:
229
+
230
+ document.xpath('xmlns:root[namespace-uri() = "http://example.com"]/xmlns:bar')
231
+
232
+ In the future I might add an API to ease this process, although at this time I
233
+ have little interest in providing an API similar to Nokogiri.
234
+
235
+ ## HTML5 Support
236
+
237
+ Oga fully supports HTML5 including the omission of certain tags. For example,
238
+ the following is parsed just fine:
239
+
240
+ <li>Hello
241
+ <li>World
242
+
243
+ This is effectively parsed into:
244
+
245
+ <li>Hello</li>
246
+ <li>World</li>
247
+
248
+ One exception Oga makes is that it does _not_ automatically insert `html`,
249
+ `head` and `body` tags. Automatically inserting these tags requires a
250
+ distinction between documents and fragments as a user might not always want
251
+ these tags to be inserted if left out. This complicates the user facing API as
252
+ well as complicating the parsing internals of Oga. As a result I have decided
253
+ that Oga _does not_ insert these tags when left out.
254
+
255
+ A more in depth explanation can be found here:
256
+ <https://github.com/YorickPeterse/oga/issues/98#issuecomment-96833066>.
257
+
258
+ ## Documentation
259
+
260
+ The documentation is best viewed [on the documentation website][doc-website].
261
+
262
+ * {file:CONTRIBUTING Contributing}
263
+ * {file:changelog Changelog}
264
+ * {file:migrating\_from\_nokogiri Migrating From Nokogiri}
265
+ * {Oga::XML::Parser XML Parser}
266
+ * {Oga::XML::SaxParser XML SAX Parser}
267
+ * {file:xml\_namespaces XML Namespaces}
268
+
269
+ ## Why Another HTML/XML parser?
270
+
271
+ Currently there are a few existing parser out there, the most famous one being
272
+ [Nokogiri][nokogiri]. Another parser that's becoming more popular these days is
273
+ [Ox][ox]. Ruby's standard library also comes with REXML.
274
+
275
+ The sad truth is that these existing libraries are problematic in their own
276
+ ways. Nokogiri for example is extremely unstable on Rubinius. On MRI it works
277
+ because of the non conccurent nature of MRI, on JRuby it works because it's
278
+ implemented as Java. Nokogiri also uses libxml2 which is a massive beast of a
279
+ library, is not thread-safe and problematic to install on certain platforms
280
+ (apparently). I don't want to compile libxml2 every time I install Nokogiri
281
+ either.
282
+
283
+ To give an example about the issues with Nokogiri on Rubinius (or any other
284
+ Ruby implementation that is not MRI or JRuby), take a look at these issues:
285
+
286
+ * <https://github.com/rubinius/rubinius/issues/2957>
287
+ * <https://github.com/rubinius/rubinius/issues/2908>
288
+ * <https://github.com/rubinius/rubinius/issues/2462>
289
+ * <https://github.com/sparklemotion/nokogiri/issues/1047>
290
+ * <https://github.com/sparklemotion/nokogiri/issues/939>
291
+
292
+ Some of these have been fixed, some have not. The core problem remains:
293
+ Nokogiri acts in a way that there can be a large number of places where it
294
+ *might* break due to throwing around void pointers and what not and expecting
295
+ that things magically work. Note that I have nothing against the people running
296
+ these projects, I just heavily, *heavily* dislike the resulting codebase one
297
+ has to deal with today.
298
+
299
+ Ox looks very promising but it lacks a rather crucial feature: parsing HTML
300
+ (without using a SAX API). It's also again a C extension making debugging more
301
+ of a pain (at least for me).
302
+
303
+ I just want an XML/HTML parser that I can rely on stability wise and that is
304
+ written in Ruby so I can actually debug it. In theory it should also make it
305
+ easier for other Ruby developers to contribute.
306
+
307
+ ## License
308
+
309
+ All source code in this repository is subject to the terms of the Mozilla Public
310
+ License, version 2.0 unless stated otherwise. A copy of this license can be
311
+ found the file "LICENSE" or at <https://www.mozilla.org/MPL/2.0/>.
312
+
313
+ [nokogiri]: https://github.com/sparklemotion/nokogiri
314
+ [oga-wikipedia]: https://en.wikipedia.org/wiki/Japanese_saw#Other_Japanese_saws
315
+ [ox]: https://github.com/ohler55/ox
316
+ [doc-website]: http://code.yorickpeterse.com/oga/latest/
317
+ [semver]: http://semver.org/spec/v2.0.0.html