rubyjedi-oga 1.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.yardopts +13 -0
- data/LICENSE +362 -0
- data/README.md +317 -0
- data/doc/css/common.css +77 -0
- data/doc/css_selectors.md +935 -0
- data/doc/manually_creating_documents.md +67 -0
- data/doc/migrating_from_nokogiri.md +169 -0
- data/doc/xml_namespaces.md +63 -0
- data/ext/c/extconf.rb +11 -0
- data/ext/c/lexer.c +2595 -0
- data/ext/c/lexer.h +16 -0
- data/ext/c/lexer.rl +198 -0
- data/ext/c/liboga.c +6 -0
- data/ext/c/liboga.h +11 -0
- data/ext/java/Liboga.java +14 -0
- data/ext/java/org/liboga/xml/Lexer.java +1363 -0
- data/ext/java/org/liboga/xml/Lexer.rl +223 -0
- data/ext/ragel/base_lexer.rl +633 -0
- data/lib/oga.rb +57 -0
- data/lib/oga/blacklist.rb +40 -0
- data/lib/oga/css/lexer.rb +743 -0
- data/lib/oga/css/parser.rb +976 -0
- data/lib/oga/entity_decoder.rb +21 -0
- data/lib/oga/html/entities.rb +2150 -0
- data/lib/oga/html/parser.rb +25 -0
- data/lib/oga/html/sax_parser.rb +18 -0
- data/lib/oga/lru.rb +160 -0
- data/lib/oga/oga.rb +57 -0
- data/lib/oga/version.rb +3 -0
- data/lib/oga/whitelist.rb +20 -0
- data/lib/oga/xml/attribute.rb +136 -0
- data/lib/oga/xml/cdata.rb +17 -0
- data/lib/oga/xml/character_node.rb +37 -0
- data/lib/oga/xml/comment.rb +17 -0
- data/lib/oga/xml/default_namespace.rb +13 -0
- data/lib/oga/xml/doctype.rb +82 -0
- data/lib/oga/xml/document.rb +108 -0
- data/lib/oga/xml/element.rb +428 -0
- data/lib/oga/xml/entities.rb +122 -0
- data/lib/oga/xml/html_void_elements.rb +15 -0
- data/lib/oga/xml/lexer.rb +550 -0
- data/lib/oga/xml/namespace.rb +48 -0
- data/lib/oga/xml/node.rb +219 -0
- data/lib/oga/xml/node_set.rb +333 -0
- data/lib/oga/xml/parser.rb +631 -0
- data/lib/oga/xml/processing_instruction.rb +37 -0
- data/lib/oga/xml/pull_parser.rb +175 -0
- data/lib/oga/xml/querying.rb +56 -0
- data/lib/oga/xml/sax_parser.rb +192 -0
- data/lib/oga/xml/text.rb +66 -0
- data/lib/oga/xml/traversal.rb +50 -0
- data/lib/oga/xml/xml_declaration.rb +65 -0
- data/lib/oga/xpath/evaluator.rb +1798 -0
- data/lib/oga/xpath/lexer.rb +1958 -0
- data/lib/oga/xpath/parser.rb +622 -0
- data/oga.gemspec +45 -0
- metadata +227 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: d5ee55c04377dd30ae94fbe33556d4d535f27cc6
|
4
|
+
data.tar.gz: 82522f8cb52c9511e16930b60e9e7e3eb12aa0e0
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: a8d082defeb61a5e2338a8e772694ba46e266bfb43def0aadf7ca73c7806385d0e1dd47d5e244fdecae6b7b29d9f7c5dfb1d6a65af6315a0a2120d5d86da6328
|
7
|
+
data.tar.gz: 2c9302cfb0bff98375b7b4ca946ce47bda444628bc53a5b5cb7c9832ea39b9e6224a27d8be22f6aca4a9b85743f8b03048e8b454cd46e8428a75b08a52ea6326
|
data/.yardopts
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,362 @@
|
|
1
|
+
Mozilla Public License, version 2.0
|
2
|
+
|
3
|
+
1. Definitions
|
4
|
+
|
5
|
+
1.1. "Contributor"
|
6
|
+
|
7
|
+
means each individual or legal entity that creates, contributes to the
|
8
|
+
creation of, or owns Covered Software.
|
9
|
+
|
10
|
+
1.2. "Contributor Version"
|
11
|
+
|
12
|
+
means the combination of the Contributions of others (if any) used by a
|
13
|
+
Contributor and that particular Contributor's Contribution.
|
14
|
+
|
15
|
+
1.3. "Contribution"
|
16
|
+
|
17
|
+
means Covered Software of a particular Contributor.
|
18
|
+
|
19
|
+
1.4. "Covered Software"
|
20
|
+
|
21
|
+
means Source Code Form to which the initial Contributor has attached the
|
22
|
+
notice in Exhibit A, the Executable Form of such Source Code Form, and
|
23
|
+
Modifications of such Source Code Form, in each case including portions
|
24
|
+
thereof.
|
25
|
+
|
26
|
+
1.5. "Incompatible With Secondary Licenses"
|
27
|
+
means
|
28
|
+
|
29
|
+
a. that the initial Contributor has attached the notice described in
|
30
|
+
Exhibit B to the Covered Software; or
|
31
|
+
|
32
|
+
b. that the Covered Software was made available under the terms of
|
33
|
+
version 1.1 or earlier of the License, but not also under the terms of
|
34
|
+
a Secondary License.
|
35
|
+
|
36
|
+
1.6. "Executable Form"
|
37
|
+
|
38
|
+
means any form of the work other than Source Code Form.
|
39
|
+
|
40
|
+
1.7. "Larger Work"
|
41
|
+
|
42
|
+
means a work that combines Covered Software with other material, in a
|
43
|
+
separate file or files, that is not Covered Software.
|
44
|
+
|
45
|
+
1.8. "License"
|
46
|
+
|
47
|
+
means this document.
|
48
|
+
|
49
|
+
1.9. "Licensable"
|
50
|
+
|
51
|
+
means having the right to grant, to the maximum extent possible, whether
|
52
|
+
at the time of the initial grant or subsequently, any and all of the
|
53
|
+
rights conveyed by this License.
|
54
|
+
|
55
|
+
1.10. "Modifications"
|
56
|
+
|
57
|
+
means any of the following:
|
58
|
+
|
59
|
+
a. any file in Source Code Form that results from an addition to,
|
60
|
+
deletion from, or modification of the contents of Covered Software; or
|
61
|
+
|
62
|
+
b. any new file in Source Code Form that contains any Covered Software.
|
63
|
+
|
64
|
+
1.11. "Patent Claims" of a Contributor
|
65
|
+
|
66
|
+
means any patent claim(s), including without limitation, method,
|
67
|
+
process, and apparatus claims, in any patent Licensable by such
|
68
|
+
Contributor that would be infringed, but for the grant of the License,
|
69
|
+
by the making, using, selling, offering for sale, having made, import,
|
70
|
+
or transfer of either its Contributions or its Contributor Version.
|
71
|
+
|
72
|
+
1.12. "Secondary License"
|
73
|
+
|
74
|
+
means either the GNU General Public License, Version 2.0, the GNU Lesser
|
75
|
+
General Public License, Version 2.1, the GNU Affero General Public
|
76
|
+
License, Version 3.0, or any later versions of those licenses.
|
77
|
+
|
78
|
+
1.13. "Source Code Form"
|
79
|
+
|
80
|
+
means the form of the work preferred for making modifications.
|
81
|
+
|
82
|
+
1.14. "You" (or "Your")
|
83
|
+
|
84
|
+
means an individual or a legal entity exercising rights under this
|
85
|
+
License. For legal entities, "You" includes any entity that controls, is
|
86
|
+
controlled by, or is under common control with You. For purposes of this
|
87
|
+
definition, "control" means (a) the power, direct or indirect, to cause
|
88
|
+
the direction or management of such entity, whether by contract or
|
89
|
+
otherwise, or (b) ownership of more than fifty percent (50%) of the
|
90
|
+
outstanding shares or beneficial ownership of such entity.
|
91
|
+
|
92
|
+
|
93
|
+
2. License Grants and Conditions
|
94
|
+
|
95
|
+
2.1. Grants
|
96
|
+
|
97
|
+
Each Contributor hereby grants You a world-wide, royalty-free,
|
98
|
+
non-exclusive license:
|
99
|
+
|
100
|
+
a. under intellectual property rights (other than patent or trademark)
|
101
|
+
Licensable by such Contributor to use, reproduce, make available,
|
102
|
+
modify, display, perform, distribute, and otherwise exploit its
|
103
|
+
Contributions, either on an unmodified basis, with Modifications, or
|
104
|
+
as part of a Larger Work; and
|
105
|
+
|
106
|
+
b. under Patent Claims of such Contributor to make, use, sell, offer for
|
107
|
+
sale, have made, import, and otherwise transfer either its
|
108
|
+
Contributions or its Contributor Version.
|
109
|
+
|
110
|
+
2.2. Effective Date
|
111
|
+
|
112
|
+
The licenses granted in Section 2.1 with respect to any Contribution
|
113
|
+
become effective for each Contribution on the date the Contributor first
|
114
|
+
distributes such Contribution.
|
115
|
+
|
116
|
+
2.3. Limitations on Grant Scope
|
117
|
+
|
118
|
+
The licenses granted in this Section 2 are the only rights granted under
|
119
|
+
this License. No additional rights or licenses will be implied from the
|
120
|
+
distribution or licensing of Covered Software under this License.
|
121
|
+
Notwithstanding Section 2.1(b) above, no patent license is granted by a
|
122
|
+
Contributor:
|
123
|
+
|
124
|
+
a. for any code that a Contributor has removed from Covered Software; or
|
125
|
+
|
126
|
+
b. for infringements caused by: (i) Your and any other third party's
|
127
|
+
modifications of Covered Software, or (ii) the combination of its
|
128
|
+
Contributions with other software (except as part of its Contributor
|
129
|
+
Version); or
|
130
|
+
|
131
|
+
c. under Patent Claims infringed by Covered Software in the absence of
|
132
|
+
its Contributions.
|
133
|
+
|
134
|
+
This License does not grant any rights in the trademarks, service marks,
|
135
|
+
or logos of any Contributor (except as may be necessary to comply with
|
136
|
+
the notice requirements in Section 3.4).
|
137
|
+
|
138
|
+
2.4. Subsequent Licenses
|
139
|
+
|
140
|
+
No Contributor makes additional grants as a result of Your choice to
|
141
|
+
distribute the Covered Software under a subsequent version of this
|
142
|
+
License (see Section 10.2) or under the terms of a Secondary License (if
|
143
|
+
permitted under the terms of Section 3.3).
|
144
|
+
|
145
|
+
2.5. Representation
|
146
|
+
|
147
|
+
Each Contributor represents that the Contributor believes its
|
148
|
+
Contributions are its original creation(s) or it has sufficient rights to
|
149
|
+
grant the rights to its Contributions conveyed by this License.
|
150
|
+
|
151
|
+
2.6. Fair Use
|
152
|
+
|
153
|
+
This License is not intended to limit any rights You have under
|
154
|
+
applicable copyright doctrines of fair use, fair dealing, or other
|
155
|
+
equivalents.
|
156
|
+
|
157
|
+
2.7. Conditions
|
158
|
+
|
159
|
+
Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted in
|
160
|
+
Section 2.1.
|
161
|
+
|
162
|
+
|
163
|
+
3. Responsibilities
|
164
|
+
|
165
|
+
3.1. Distribution of Source Form
|
166
|
+
|
167
|
+
All distribution of Covered Software in Source Code Form, including any
|
168
|
+
Modifications that You create or to which You contribute, must be under
|
169
|
+
the terms of this License. You must inform recipients that the Source
|
170
|
+
Code Form of the Covered Software is governed by the terms of this
|
171
|
+
License, and how they can obtain a copy of this License. You may not
|
172
|
+
attempt to alter or restrict the recipients' rights in the Source Code
|
173
|
+
Form.
|
174
|
+
|
175
|
+
3.2. Distribution of Executable Form
|
176
|
+
|
177
|
+
If You distribute Covered Software in Executable Form then:
|
178
|
+
|
179
|
+
a. such Covered Software must also be made available in Source Code Form,
|
180
|
+
as described in Section 3.1, and You must inform recipients of the
|
181
|
+
Executable Form how they can obtain a copy of such Source Code Form by
|
182
|
+
reasonable means in a timely manner, at a charge no more than the cost
|
183
|
+
of distribution to the recipient; and
|
184
|
+
|
185
|
+
b. You may distribute such Executable Form under the terms of this
|
186
|
+
License, or sublicense it under different terms, provided that the
|
187
|
+
license for the Executable Form does not attempt to limit or alter the
|
188
|
+
recipients' rights in the Source Code Form under this License.
|
189
|
+
|
190
|
+
3.3. Distribution of a Larger Work
|
191
|
+
|
192
|
+
You may create and distribute a Larger Work under terms of Your choice,
|
193
|
+
provided that You also comply with the requirements of this License for
|
194
|
+
the Covered Software. If the Larger Work is a combination of Covered
|
195
|
+
Software with a work governed by one or more Secondary Licenses, and the
|
196
|
+
Covered Software is not Incompatible With Secondary Licenses, this
|
197
|
+
License permits You to additionally distribute such Covered Software
|
198
|
+
under the terms of such Secondary License(s), so that the recipient of
|
199
|
+
the Larger Work may, at their option, further distribute the Covered
|
200
|
+
Software under the terms of either this License or such Secondary
|
201
|
+
License(s).
|
202
|
+
|
203
|
+
3.4. Notices
|
204
|
+
|
205
|
+
You may not remove or alter the substance of any license notices
|
206
|
+
(including copyright notices, patent notices, disclaimers of warranty, or
|
207
|
+
limitations of liability) contained within the Source Code Form of the
|
208
|
+
Covered Software, except that You may alter any license notices to the
|
209
|
+
extent required to remedy known factual inaccuracies.
|
210
|
+
|
211
|
+
3.5. Application of Additional Terms
|
212
|
+
|
213
|
+
You may choose to offer, and to charge a fee for, warranty, support,
|
214
|
+
indemnity or liability obligations to one or more recipients of Covered
|
215
|
+
Software. However, You may do so only on Your own behalf, and not on
|
216
|
+
behalf of any Contributor. You must make it absolutely clear that any
|
217
|
+
such warranty, support, indemnity, or liability obligation is offered by
|
218
|
+
You alone, and You hereby agree to indemnify every Contributor for any
|
219
|
+
liability incurred by such Contributor as a result of warranty, support,
|
220
|
+
indemnity or liability terms You offer. You may include additional
|
221
|
+
disclaimers of warranty and limitations of liability specific to any
|
222
|
+
jurisdiction.
|
223
|
+
|
224
|
+
4. Inability to Comply Due to Statute or Regulation
|
225
|
+
|
226
|
+
If it is impossible for You to comply with any of the terms of this License
|
227
|
+
with respect to some or all of the Covered Software due to statute,
|
228
|
+
judicial order, or regulation then You must: (a) comply with the terms of
|
229
|
+
this License to the maximum extent possible; and (b) describe the
|
230
|
+
limitations and the code they affect. Such description must be placed in a
|
231
|
+
text file included with all distributions of the Covered Software under
|
232
|
+
this License. Except to the extent prohibited by statute or regulation,
|
233
|
+
such description must be sufficiently detailed for a recipient of ordinary
|
234
|
+
skill to be able to understand it.
|
235
|
+
|
236
|
+
5. Termination
|
237
|
+
|
238
|
+
5.1. The rights granted under this License will terminate automatically if You
|
239
|
+
fail to comply with any of its terms. However, if You become compliant,
|
240
|
+
then the rights granted under this License from a particular Contributor
|
241
|
+
are reinstated (a) provisionally, unless and until such Contributor
|
242
|
+
explicitly and finally terminates Your grants, and (b) on an ongoing
|
243
|
+
basis, if such Contributor fails to notify You of the non-compliance by
|
244
|
+
some reasonable means prior to 60 days after You have come back into
|
245
|
+
compliance. Moreover, Your grants from a particular Contributor are
|
246
|
+
reinstated on an ongoing basis if such Contributor notifies You of the
|
247
|
+
non-compliance by some reasonable means, this is the first time You have
|
248
|
+
received notice of non-compliance with this License from such
|
249
|
+
Contributor, and You become compliant prior to 30 days after Your receipt
|
250
|
+
of the notice.
|
251
|
+
|
252
|
+
5.2. If You initiate litigation against any entity by asserting a patent
|
253
|
+
infringement claim (excluding declaratory judgment actions,
|
254
|
+
counter-claims, and cross-claims) alleging that a Contributor Version
|
255
|
+
directly or indirectly infringes any patent, then the rights granted to
|
256
|
+
You by any and all Contributors for the Covered Software under Section
|
257
|
+
2.1 of this License shall terminate.
|
258
|
+
|
259
|
+
5.3. In the event of termination under Sections 5.1 or 5.2 above, all end user
|
260
|
+
license agreements (excluding distributors and resellers) which have been
|
261
|
+
validly granted by You or Your distributors under this License prior to
|
262
|
+
termination shall survive termination.
|
263
|
+
|
264
|
+
6. Disclaimer of Warranty
|
265
|
+
|
266
|
+
Covered Software is provided under this License on an "as is" basis,
|
267
|
+
without warranty of any kind, either expressed, implied, or statutory,
|
268
|
+
including, without limitation, warranties that the Covered Software is free
|
269
|
+
of defects, merchantable, fit for a particular purpose or non-infringing.
|
270
|
+
The entire risk as to the quality and performance of the Covered Software
|
271
|
+
is with You. Should any Covered Software prove defective in any respect,
|
272
|
+
You (not any Contributor) assume the cost of any necessary servicing,
|
273
|
+
repair, or correction. This disclaimer of warranty constitutes an essential
|
274
|
+
part of this License. No use of any Covered Software is authorized under
|
275
|
+
this License except under this disclaimer.
|
276
|
+
|
277
|
+
7. Limitation of Liability
|
278
|
+
|
279
|
+
Under no circumstances and under no legal theory, whether tort (including
|
280
|
+
negligence), contract, or otherwise, shall any Contributor, or anyone who
|
281
|
+
distributes Covered Software as permitted above, be liable to You for any
|
282
|
+
direct, indirect, special, incidental, or consequential damages of any
|
283
|
+
character including, without limitation, damages for lost profits, loss of
|
284
|
+
goodwill, work stoppage, computer failure or malfunction, or any and all
|
285
|
+
other commercial damages or losses, even if such party shall have been
|
286
|
+
informed of the possibility of such damages. This limitation of liability
|
287
|
+
shall not apply to liability for death or personal injury resulting from
|
288
|
+
such party's negligence to the extent applicable law prohibits such
|
289
|
+
limitation. Some jurisdictions do not allow the exclusion or limitation of
|
290
|
+
incidental or consequential damages, so this exclusion and limitation may
|
291
|
+
not apply to You.
|
292
|
+
|
293
|
+
8. Litigation
|
294
|
+
|
295
|
+
Any litigation relating to this License may be brought only in the courts
|
296
|
+
of a jurisdiction where the defendant maintains its principal place of
|
297
|
+
business and such litigation shall be governed by laws of that
|
298
|
+
jurisdiction, without reference to its conflict-of-law provisions. Nothing
|
299
|
+
in this Section shall prevent a party's ability to bring cross-claims or
|
300
|
+
counter-claims.
|
301
|
+
|
302
|
+
9. Miscellaneous
|
303
|
+
|
304
|
+
This License represents the complete agreement concerning the subject
|
305
|
+
matter hereof. If any provision of this License is held to be
|
306
|
+
unenforceable, such provision shall be reformed only to the extent
|
307
|
+
necessary to make it enforceable. Any law or regulation which provides that
|
308
|
+
the language of a contract shall be construed against the drafter shall not
|
309
|
+
be used to construe this License against a Contributor.
|
310
|
+
|
311
|
+
|
312
|
+
10. Versions of the License
|
313
|
+
|
314
|
+
10.1. New Versions
|
315
|
+
|
316
|
+
Mozilla Foundation is the license steward. Except as provided in Section
|
317
|
+
10.3, no one other than the license steward has the right to modify or
|
318
|
+
publish new versions of this License. Each version will be given a
|
319
|
+
distinguishing version number.
|
320
|
+
|
321
|
+
10.2. Effect of New Versions
|
322
|
+
|
323
|
+
You may distribute the Covered Software under the terms of the version
|
324
|
+
of the License under which You originally received the Covered Software,
|
325
|
+
or under the terms of any subsequent version published by the license
|
326
|
+
steward.
|
327
|
+
|
328
|
+
10.3. Modified Versions
|
329
|
+
|
330
|
+
If you create software not governed by this License, and you want to
|
331
|
+
create a new license for such software, you may create and use a
|
332
|
+
modified version of this License if you rename the license and remove
|
333
|
+
any references to the name of the license steward (except to note that
|
334
|
+
such modified license differs from this License).
|
335
|
+
|
336
|
+
10.4. Distributing Source Code Form that is Incompatible With Secondary
|
337
|
+
Licenses If You choose to distribute Source Code Form that is
|
338
|
+
Incompatible With Secondary Licenses under the terms of this version of
|
339
|
+
the License, the notice described in Exhibit B of this License must be
|
340
|
+
attached.
|
341
|
+
|
342
|
+
Exhibit A - Source Code Form License Notice
|
343
|
+
|
344
|
+
This Source Code Form is subject to the
|
345
|
+
terms of the Mozilla Public License, v.
|
346
|
+
2.0. If a copy of the MPL was not
|
347
|
+
distributed with this file, You can
|
348
|
+
obtain one at
|
349
|
+
http://mozilla.org/MPL/2.0/.
|
350
|
+
|
351
|
+
If it is not possible or desirable to put the notice in a particular file,
|
352
|
+
then You may include the notice in a location (such as a LICENSE file in a
|
353
|
+
relevant directory) where a recipient would be likely to look for such a
|
354
|
+
notice.
|
355
|
+
|
356
|
+
You may add additional accurate notices of copyright ownership.
|
357
|
+
|
358
|
+
Exhibit B - "Incompatible With Secondary Licenses" Notice
|
359
|
+
|
360
|
+
This Source Code Form is "Incompatible
|
361
|
+
With Secondary Licenses", as defined by
|
362
|
+
the Mozilla Public License, v. 2.0.
|
data/README.md
ADDED
@@ -0,0 +1,317 @@
|
|
1
|
+
# Oga
|
2
|
+
|
3
|
+
Oga is an XML/HTML parser written in Ruby. It provides an easy to use API for
|
4
|
+
parsing, modifying and querying documents (using XPath expressions). Oga does
|
5
|
+
not require system libraries such as libxml, making it easier and faster to
|
6
|
+
install on various platforms. To achieve better performance Oga uses a small,
|
7
|
+
native extension (C for MRI/Rubinius, Java for JRuby).
|
8
|
+
|
9
|
+
Oga provides an API that allows you to safely parse and query documents in a
|
10
|
+
multi-threaded environment, without having to worry about your applications
|
11
|
+
blowing up.
|
12
|
+
|
13
|
+
From [Wikipedia][oga-wikipedia]:
|
14
|
+
|
15
|
+
> Oga: A large two-person saw used for ripping large boards in the days before
|
16
|
+
> power saws. One person stood on a raised platform, with the board below him,
|
17
|
+
> and the other person stood underneath them.
|
18
|
+
|
19
|
+
The name is a pun on [Nokogiri][nokogiri].
|
20
|
+
|
21
|
+
Oga uses [Semantic Versioning 2.0][semver] as its versioning scheme. All
|
22
|
+
classes, modules and methods are part of the public API _unless_ they are
|
23
|
+
declared as private using Ruby's `private` keyword or YARD's `@api private` tag.
|
24
|
+
|
25
|
+
## Examples
|
26
|
+
|
27
|
+
Parsing a simple string of XML:
|
28
|
+
|
29
|
+
Oga.parse_xml('<people><person>Alice</person></people>')
|
30
|
+
|
31
|
+
Parsing XML using strict mode (disables automatic tag insertion):
|
32
|
+
|
33
|
+
Oga.parse_xml('<people>foo</people>', :strict => true) # works fine
|
34
|
+
Oga.parse_xml('<people>foo', :strict => true) # throws an error
|
35
|
+
|
36
|
+
Parsing a simple string of HTML:
|
37
|
+
|
38
|
+
Oga.parse_html('<link rel="stylesheet" href="foo.css">')
|
39
|
+
|
40
|
+
Parsing an IO handle pointing to XML (this also works when using
|
41
|
+
`Oga.parse_html`):
|
42
|
+
|
43
|
+
handle = File.open('path/to/file.xml')
|
44
|
+
|
45
|
+
Oga.parse_xml(handle)
|
46
|
+
|
47
|
+
Parsing an IO handle using the pull parser:
|
48
|
+
|
49
|
+
handle = File.open('path/to/file.xml')
|
50
|
+
parser = Oga::XML::PullParser.new(handle)
|
51
|
+
|
52
|
+
parser.parse do |node|
|
53
|
+
parser.on(:text) do
|
54
|
+
puts node.text
|
55
|
+
end
|
56
|
+
end
|
57
|
+
|
58
|
+
Using an Enumerator to download and parse an XML document on the fly:
|
59
|
+
|
60
|
+
enum = Enumerator.new do |yielder|
|
61
|
+
HTTPClient.get('http://some-website.com/some-big-file.xml') do |chunk|
|
62
|
+
yielder << chunk
|
63
|
+
end
|
64
|
+
end
|
65
|
+
|
66
|
+
document = Oga.parse_xml(enum)
|
67
|
+
|
68
|
+
Parse a string of XML using the SAX parser:
|
69
|
+
|
70
|
+
class ElementNames
|
71
|
+
attr_reader :names
|
72
|
+
|
73
|
+
def initialize
|
74
|
+
@names = []
|
75
|
+
end
|
76
|
+
|
77
|
+
def on_element(namespace, name, attrs = {})
|
78
|
+
@names << name
|
79
|
+
end
|
80
|
+
end
|
81
|
+
|
82
|
+
handler = ElementNames.new
|
83
|
+
|
84
|
+
Oga.sax_parse_xml(handler, '<foo><bar></bar></foo>')
|
85
|
+
|
86
|
+
handler.names # => ["foo", "bar"]
|
87
|
+
|
88
|
+
Querying a document using XPath:
|
89
|
+
|
90
|
+
document = Oga.parse_xml <<-EOF
|
91
|
+
<people>
|
92
|
+
<person id="1">
|
93
|
+
<name>Alice</name>
|
94
|
+
<age>28</name>
|
95
|
+
</person>
|
96
|
+
</people>
|
97
|
+
EOF
|
98
|
+
|
99
|
+
# The "xpath" method returns an enumerable (Oga::XML::NodeSet) that you can
|
100
|
+
# iterate over.
|
101
|
+
document.xpath('people/person').each do |person|
|
102
|
+
puts person.get('id') # => "1"
|
103
|
+
|
104
|
+
# The "at_xpath" method returns a single node from a set, it's the same as
|
105
|
+
# person.xpath('name').first.
|
106
|
+
puts person.at_xpath('name').text # => "Alice"
|
107
|
+
end
|
108
|
+
|
109
|
+
Querying the same document using CSS:
|
110
|
+
|
111
|
+
document = Oga.parse_xml <<-EOF
|
112
|
+
<people>
|
113
|
+
<person id="1">
|
114
|
+
<name>Alice</name>
|
115
|
+
<age>28</name>
|
116
|
+
</person>
|
117
|
+
</people>
|
118
|
+
EOF
|
119
|
+
|
120
|
+
# The "css" method returns an enumerable (Oga::XML::NodeSet) that you can
|
121
|
+
# iterate over.
|
122
|
+
document.css('people person').each do |person|
|
123
|
+
puts person.get('id') # => "1"
|
124
|
+
|
125
|
+
# The "at_css" method returns a single node from a set, it's the same as
|
126
|
+
# person.css('name').first.
|
127
|
+
puts person.at_css('name').text # => "Alice"
|
128
|
+
end
|
129
|
+
|
130
|
+
Modifying a document and serializing it back to XML:
|
131
|
+
|
132
|
+
document = Oga.parse_xml('<people><person>Alice</person></people>')
|
133
|
+
name = document.at_xpath('people/person[1]/text()')
|
134
|
+
|
135
|
+
name.text = 'Bob'
|
136
|
+
|
137
|
+
document.to_xml # => "<people><person>Bob</person></people>"
|
138
|
+
|
139
|
+
Querying a document using a namespace:
|
140
|
+
|
141
|
+
document = Oga.parse_xml('<root xmlns:x="foo"><x:div></x:div></root>')
|
142
|
+
div = document.xpath('root/x:div').first
|
143
|
+
|
144
|
+
div.namespace # => Namespace(name: "x" uri: "foo")
|
145
|
+
|
146
|
+
## Features
|
147
|
+
|
148
|
+
* Support for parsing XML and HTML(5)
|
149
|
+
* DOM parsing
|
150
|
+
* Stream/pull parsing
|
151
|
+
* SAX parsing
|
152
|
+
* Low memory footprint
|
153
|
+
* High performance, if something doesn't perform well enough it's a bug
|
154
|
+
* Support for XPath 1.0
|
155
|
+
* CSS3 selector support
|
156
|
+
* XML namespace support (registering, querying, etc)
|
157
|
+
|
158
|
+
## Requirements
|
159
|
+
|
160
|
+
| Ruby | Required | Recommended |
|
161
|
+
|:---------|:--------------|:------------|
|
162
|
+
| MRI | >= 1.9.3 | >= 2.1.2 |
|
163
|
+
| Rubinius | >= 2.2 | >= 2.2.10 |
|
164
|
+
| JRuby | >= 1.7 | >= 1.7.12 |
|
165
|
+
| Maglev | Not supported | |
|
166
|
+
| Topaz | Not supported | |
|
167
|
+
| mruby | Not supported | |
|
168
|
+
|
169
|
+
Maglev and Topaz are not supported due to the lack of a C API (that I know of)
|
170
|
+
and the lack of active development of these Ruby implementations. mruby is not
|
171
|
+
supported because it's a very different implementation all together.
|
172
|
+
|
173
|
+
To install Oga on MRI or Rubinius you'll need to have a working compiler such as
|
174
|
+
gcc or clang. Oga's C extension can be compiled with both. JRuby does not
|
175
|
+
require a compiler as the native extension is compiled during the Gem building
|
176
|
+
process and bundled inside the Gem itself.
|
177
|
+
|
178
|
+
## Thread Safety
|
179
|
+
|
180
|
+
Documents parsed using Oga are thread-safe as long as they are not modified by
|
181
|
+
multiple threads at the same time. Querying documents using XPath can be done by
|
182
|
+
multiple threads just fine. Write operations, such as removing attributes, are
|
183
|
+
_not_ thread-safe and should not be done by multiple threads at once.
|
184
|
+
|
185
|
+
It is advised that you do not share parsed documents between threads unless you
|
186
|
+
_really_ have to.
|
187
|
+
|
188
|
+
## Namespace Support
|
189
|
+
|
190
|
+
Oga fully supports parsing/registering XML namespaces as well as querying them
|
191
|
+
using XPath. For example, take the following XML:
|
192
|
+
|
193
|
+
<root xmlns="http://example.com">
|
194
|
+
<bar>bar</bar>
|
195
|
+
</root>
|
196
|
+
|
197
|
+
If one were to try and query the `bar` element (e.g. using XPath `root/bar`)
|
198
|
+
they'd end up with an empty node set. This is due to `<root>` defining an
|
199
|
+
alternative default namespace. Instead you can query this element using the
|
200
|
+
following XPath:
|
201
|
+
|
202
|
+
*[local-name() = "root"]/*[local-name() = "bar"]
|
203
|
+
|
204
|
+
Alternatively, if you don't really care where the `<bar>` element is located you
|
205
|
+
can use the following:
|
206
|
+
|
207
|
+
descendant::*[local-name() = "bar"]
|
208
|
+
|
209
|
+
And if you want to specify an explici namespace URI, you can use this:
|
210
|
+
|
211
|
+
descendant::*[local-name() = "bar" and namespace-uri() = "http://example.com"]
|
212
|
+
|
213
|
+
Unlike Nokogiri, Oga does _not_ provide a way to create "dynamic" namespaces.
|
214
|
+
That is, Nokogiri allows one to query the above document as following:
|
215
|
+
|
216
|
+
document = Nokogiri::XML('<root xmlns="http://example.com"><bar>bar</bar></root>')
|
217
|
+
|
218
|
+
document.xpath('x:root/x:bar', :x => 'http://example.com')
|
219
|
+
|
220
|
+
Oga does have a small trick you can use to cut down the size of your XPath
|
221
|
+
queries. Because Oga assigns the name "xmlns" to default namespaces you can use
|
222
|
+
this in your XPath queries:
|
223
|
+
|
224
|
+
document = Oga.parse_xml('<root xmlns="http://example.com"><bar>bar</bar></root>')
|
225
|
+
|
226
|
+
document.xpath('xmlns:root/xmlns:bar')
|
227
|
+
|
228
|
+
When using this you can still restrict the query to the correct namespace URI:
|
229
|
+
|
230
|
+
document.xpath('xmlns:root[namespace-uri() = "http://example.com"]/xmlns:bar')
|
231
|
+
|
232
|
+
In the future I might add an API to ease this process, although at this time I
|
233
|
+
have little interest in providing an API similar to Nokogiri.
|
234
|
+
|
235
|
+
## HTML5 Support
|
236
|
+
|
237
|
+
Oga fully supports HTML5 including the omission of certain tags. For example,
|
238
|
+
the following is parsed just fine:
|
239
|
+
|
240
|
+
<li>Hello
|
241
|
+
<li>World
|
242
|
+
|
243
|
+
This is effectively parsed into:
|
244
|
+
|
245
|
+
<li>Hello</li>
|
246
|
+
<li>World</li>
|
247
|
+
|
248
|
+
One exception Oga makes is that it does _not_ automatically insert `html`,
|
249
|
+
`head` and `body` tags. Automatically inserting these tags requires a
|
250
|
+
distinction between documents and fragments as a user might not always want
|
251
|
+
these tags to be inserted if left out. This complicates the user facing API as
|
252
|
+
well as complicating the parsing internals of Oga. As a result I have decided
|
253
|
+
that Oga _does not_ insert these tags when left out.
|
254
|
+
|
255
|
+
A more in depth explanation can be found here:
|
256
|
+
<https://github.com/YorickPeterse/oga/issues/98#issuecomment-96833066>.
|
257
|
+
|
258
|
+
## Documentation
|
259
|
+
|
260
|
+
The documentation is best viewed [on the documentation website][doc-website].
|
261
|
+
|
262
|
+
* {file:CONTRIBUTING Contributing}
|
263
|
+
* {file:changelog Changelog}
|
264
|
+
* {file:migrating\_from\_nokogiri Migrating From Nokogiri}
|
265
|
+
* {Oga::XML::Parser XML Parser}
|
266
|
+
* {Oga::XML::SaxParser XML SAX Parser}
|
267
|
+
* {file:xml\_namespaces XML Namespaces}
|
268
|
+
|
269
|
+
## Why Another HTML/XML parser?
|
270
|
+
|
271
|
+
Currently there are a few existing parser out there, the most famous one being
|
272
|
+
[Nokogiri][nokogiri]. Another parser that's becoming more popular these days is
|
273
|
+
[Ox][ox]. Ruby's standard library also comes with REXML.
|
274
|
+
|
275
|
+
The sad truth is that these existing libraries are problematic in their own
|
276
|
+
ways. Nokogiri for example is extremely unstable on Rubinius. On MRI it works
|
277
|
+
because of the non conccurent nature of MRI, on JRuby it works because it's
|
278
|
+
implemented as Java. Nokogiri also uses libxml2 which is a massive beast of a
|
279
|
+
library, is not thread-safe and problematic to install on certain platforms
|
280
|
+
(apparently). I don't want to compile libxml2 every time I install Nokogiri
|
281
|
+
either.
|
282
|
+
|
283
|
+
To give an example about the issues with Nokogiri on Rubinius (or any other
|
284
|
+
Ruby implementation that is not MRI or JRuby), take a look at these issues:
|
285
|
+
|
286
|
+
* <https://github.com/rubinius/rubinius/issues/2957>
|
287
|
+
* <https://github.com/rubinius/rubinius/issues/2908>
|
288
|
+
* <https://github.com/rubinius/rubinius/issues/2462>
|
289
|
+
* <https://github.com/sparklemotion/nokogiri/issues/1047>
|
290
|
+
* <https://github.com/sparklemotion/nokogiri/issues/939>
|
291
|
+
|
292
|
+
Some of these have been fixed, some have not. The core problem remains:
|
293
|
+
Nokogiri acts in a way that there can be a large number of places where it
|
294
|
+
*might* break due to throwing around void pointers and what not and expecting
|
295
|
+
that things magically work. Note that I have nothing against the people running
|
296
|
+
these projects, I just heavily, *heavily* dislike the resulting codebase one
|
297
|
+
has to deal with today.
|
298
|
+
|
299
|
+
Ox looks very promising but it lacks a rather crucial feature: parsing HTML
|
300
|
+
(without using a SAX API). It's also again a C extension making debugging more
|
301
|
+
of a pain (at least for me).
|
302
|
+
|
303
|
+
I just want an XML/HTML parser that I can rely on stability wise and that is
|
304
|
+
written in Ruby so I can actually debug it. In theory it should also make it
|
305
|
+
easier for other Ruby developers to contribute.
|
306
|
+
|
307
|
+
## License
|
308
|
+
|
309
|
+
All source code in this repository is subject to the terms of the Mozilla Public
|
310
|
+
License, version 2.0 unless stated otherwise. A copy of this license can be
|
311
|
+
found the file "LICENSE" or at <https://www.mozilla.org/MPL/2.0/>.
|
312
|
+
|
313
|
+
[nokogiri]: https://github.com/sparklemotion/nokogiri
|
314
|
+
[oga-wikipedia]: https://en.wikipedia.org/wiki/Japanese_saw#Other_Japanese_saws
|
315
|
+
[ox]: https://github.com/ohler55/ox
|
316
|
+
[doc-website]: http://code.yorickpeterse.com/oga/latest/
|
317
|
+
[semver]: http://semver.org/spec/v2.0.0.html
|