rubyjedi-oga 1.0.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.yardopts +13 -0
- data/LICENSE +362 -0
- data/README.md +317 -0
- data/doc/css/common.css +77 -0
- data/doc/css_selectors.md +935 -0
- data/doc/manually_creating_documents.md +67 -0
- data/doc/migrating_from_nokogiri.md +169 -0
- data/doc/xml_namespaces.md +63 -0
- data/ext/c/extconf.rb +11 -0
- data/ext/c/lexer.c +2595 -0
- data/ext/c/lexer.h +16 -0
- data/ext/c/lexer.rl +198 -0
- data/ext/c/liboga.c +6 -0
- data/ext/c/liboga.h +11 -0
- data/ext/java/Liboga.java +14 -0
- data/ext/java/org/liboga/xml/Lexer.java +1363 -0
- data/ext/java/org/liboga/xml/Lexer.rl +223 -0
- data/ext/ragel/base_lexer.rl +633 -0
- data/lib/oga.rb +57 -0
- data/lib/oga/blacklist.rb +40 -0
- data/lib/oga/css/lexer.rb +743 -0
- data/lib/oga/css/parser.rb +976 -0
- data/lib/oga/entity_decoder.rb +21 -0
- data/lib/oga/html/entities.rb +2150 -0
- data/lib/oga/html/parser.rb +25 -0
- data/lib/oga/html/sax_parser.rb +18 -0
- data/lib/oga/lru.rb +160 -0
- data/lib/oga/oga.rb +57 -0
- data/lib/oga/version.rb +3 -0
- data/lib/oga/whitelist.rb +20 -0
- data/lib/oga/xml/attribute.rb +136 -0
- data/lib/oga/xml/cdata.rb +17 -0
- data/lib/oga/xml/character_node.rb +37 -0
- data/lib/oga/xml/comment.rb +17 -0
- data/lib/oga/xml/default_namespace.rb +13 -0
- data/lib/oga/xml/doctype.rb +82 -0
- data/lib/oga/xml/document.rb +108 -0
- data/lib/oga/xml/element.rb +428 -0
- data/lib/oga/xml/entities.rb +122 -0
- data/lib/oga/xml/html_void_elements.rb +15 -0
- data/lib/oga/xml/lexer.rb +550 -0
- data/lib/oga/xml/namespace.rb +48 -0
- data/lib/oga/xml/node.rb +219 -0
- data/lib/oga/xml/node_set.rb +333 -0
- data/lib/oga/xml/parser.rb +631 -0
- data/lib/oga/xml/processing_instruction.rb +37 -0
- data/lib/oga/xml/pull_parser.rb +175 -0
- data/lib/oga/xml/querying.rb +56 -0
- data/lib/oga/xml/sax_parser.rb +192 -0
- data/lib/oga/xml/text.rb +66 -0
- data/lib/oga/xml/traversal.rb +50 -0
- data/lib/oga/xml/xml_declaration.rb +65 -0
- data/lib/oga/xpath/evaluator.rb +1798 -0
- data/lib/oga/xpath/lexer.rb +1958 -0
- data/lib/oga/xpath/parser.rb +622 -0
- data/oga.gemspec +45 -0
- metadata +227 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: d5ee55c04377dd30ae94fbe33556d4d535f27cc6
|
4
|
+
data.tar.gz: 82522f8cb52c9511e16930b60e9e7e3eb12aa0e0
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: a8d082defeb61a5e2338a8e772694ba46e266bfb43def0aadf7ca73c7806385d0e1dd47d5e244fdecae6b7b29d9f7c5dfb1d6a65af6315a0a2120d5d86da6328
|
7
|
+
data.tar.gz: 2c9302cfb0bff98375b7b4ca946ce47bda444628bc53a5b5cb7c9832ea39b9e6224a27d8be22f6aca4a9b85743f8b03048e8b454cd46e8428a75b08a52ea6326
|
data/.yardopts
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,362 @@
|
|
1
|
+
Mozilla Public License, version 2.0
|
2
|
+
|
3
|
+
1. Definitions
|
4
|
+
|
5
|
+
1.1. "Contributor"
|
6
|
+
|
7
|
+
means each individual or legal entity that creates, contributes to the
|
8
|
+
creation of, or owns Covered Software.
|
9
|
+
|
10
|
+
1.2. "Contributor Version"
|
11
|
+
|
12
|
+
means the combination of the Contributions of others (if any) used by a
|
13
|
+
Contributor and that particular Contributor's Contribution.
|
14
|
+
|
15
|
+
1.3. "Contribution"
|
16
|
+
|
17
|
+
means Covered Software of a particular Contributor.
|
18
|
+
|
19
|
+
1.4. "Covered Software"
|
20
|
+
|
21
|
+
means Source Code Form to which the initial Contributor has attached the
|
22
|
+
notice in Exhibit A, the Executable Form of such Source Code Form, and
|
23
|
+
Modifications of such Source Code Form, in each case including portions
|
24
|
+
thereof.
|
25
|
+
|
26
|
+
1.5. "Incompatible With Secondary Licenses"
|
27
|
+
means
|
28
|
+
|
29
|
+
a. that the initial Contributor has attached the notice described in
|
30
|
+
Exhibit B to the Covered Software; or
|
31
|
+
|
32
|
+
b. that the Covered Software was made available under the terms of
|
33
|
+
version 1.1 or earlier of the License, but not also under the terms of
|
34
|
+
a Secondary License.
|
35
|
+
|
36
|
+
1.6. "Executable Form"
|
37
|
+
|
38
|
+
means any form of the work other than Source Code Form.
|
39
|
+
|
40
|
+
1.7. "Larger Work"
|
41
|
+
|
42
|
+
means a work that combines Covered Software with other material, in a
|
43
|
+
separate file or files, that is not Covered Software.
|
44
|
+
|
45
|
+
1.8. "License"
|
46
|
+
|
47
|
+
means this document.
|
48
|
+
|
49
|
+
1.9. "Licensable"
|
50
|
+
|
51
|
+
means having the right to grant, to the maximum extent possible, whether
|
52
|
+
at the time of the initial grant or subsequently, any and all of the
|
53
|
+
rights conveyed by this License.
|
54
|
+
|
55
|
+
1.10. "Modifications"
|
56
|
+
|
57
|
+
means any of the following:
|
58
|
+
|
59
|
+
a. any file in Source Code Form that results from an addition to,
|
60
|
+
deletion from, or modification of the contents of Covered Software; or
|
61
|
+
|
62
|
+
b. any new file in Source Code Form that contains any Covered Software.
|
63
|
+
|
64
|
+
1.11. "Patent Claims" of a Contributor
|
65
|
+
|
66
|
+
means any patent claim(s), including without limitation, method,
|
67
|
+
process, and apparatus claims, in any patent Licensable by such
|
68
|
+
Contributor that would be infringed, but for the grant of the License,
|
69
|
+
by the making, using, selling, offering for sale, having made, import,
|
70
|
+
or transfer of either its Contributions or its Contributor Version.
|
71
|
+
|
72
|
+
1.12. "Secondary License"
|
73
|
+
|
74
|
+
means either the GNU General Public License, Version 2.0, the GNU Lesser
|
75
|
+
General Public License, Version 2.1, the GNU Affero General Public
|
76
|
+
License, Version 3.0, or any later versions of those licenses.
|
77
|
+
|
78
|
+
1.13. "Source Code Form"
|
79
|
+
|
80
|
+
means the form of the work preferred for making modifications.
|
81
|
+
|
82
|
+
1.14. "You" (or "Your")
|
83
|
+
|
84
|
+
means an individual or a legal entity exercising rights under this
|
85
|
+
License. For legal entities, "You" includes any entity that controls, is
|
86
|
+
controlled by, or is under common control with You. For purposes of this
|
87
|
+
definition, "control" means (a) the power, direct or indirect, to cause
|
88
|
+
the direction or management of such entity, whether by contract or
|
89
|
+
otherwise, or (b) ownership of more than fifty percent (50%) of the
|
90
|
+
outstanding shares or beneficial ownership of such entity.
|
91
|
+
|
92
|
+
|
93
|
+
2. License Grants and Conditions
|
94
|
+
|
95
|
+
2.1. Grants
|
96
|
+
|
97
|
+
Each Contributor hereby grants You a world-wide, royalty-free,
|
98
|
+
non-exclusive license:
|
99
|
+
|
100
|
+
a. under intellectual property rights (other than patent or trademark)
|
101
|
+
Licensable by such Contributor to use, reproduce, make available,
|
102
|
+
modify, display, perform, distribute, and otherwise exploit its
|
103
|
+
Contributions, either on an unmodified basis, with Modifications, or
|
104
|
+
as part of a Larger Work; and
|
105
|
+
|
106
|
+
b. under Patent Claims of such Contributor to make, use, sell, offer for
|
107
|
+
sale, have made, import, and otherwise transfer either its
|
108
|
+
Contributions or its Contributor Version.
|
109
|
+
|
110
|
+
2.2. Effective Date
|
111
|
+
|
112
|
+
The licenses granted in Section 2.1 with respect to any Contribution
|
113
|
+
become effective for each Contribution on the date the Contributor first
|
114
|
+
distributes such Contribution.
|
115
|
+
|
116
|
+
2.3. Limitations on Grant Scope
|
117
|
+
|
118
|
+
The licenses granted in this Section 2 are the only rights granted under
|
119
|
+
this License. No additional rights or licenses will be implied from the
|
120
|
+
distribution or licensing of Covered Software under this License.
|
121
|
+
Notwithstanding Section 2.1(b) above, no patent license is granted by a
|
122
|
+
Contributor:
|
123
|
+
|
124
|
+
a. for any code that a Contributor has removed from Covered Software; or
|
125
|
+
|
126
|
+
b. for infringements caused by: (i) Your and any other third party's
|
127
|
+
modifications of Covered Software, or (ii) the combination of its
|
128
|
+
Contributions with other software (except as part of its Contributor
|
129
|
+
Version); or
|
130
|
+
|
131
|
+
c. under Patent Claims infringed by Covered Software in the absence of
|
132
|
+
its Contributions.
|
133
|
+
|
134
|
+
This License does not grant any rights in the trademarks, service marks,
|
135
|
+
or logos of any Contributor (except as may be necessary to comply with
|
136
|
+
the notice requirements in Section 3.4).
|
137
|
+
|
138
|
+
2.4. Subsequent Licenses
|
139
|
+
|
140
|
+
No Contributor makes additional grants as a result of Your choice to
|
141
|
+
distribute the Covered Software under a subsequent version of this
|
142
|
+
License (see Section 10.2) or under the terms of a Secondary License (if
|
143
|
+
permitted under the terms of Section 3.3).
|
144
|
+
|
145
|
+
2.5. Representation
|
146
|
+
|
147
|
+
Each Contributor represents that the Contributor believes its
|
148
|
+
Contributions are its original creation(s) or it has sufficient rights to
|
149
|
+
grant the rights to its Contributions conveyed by this License.
|
150
|
+
|
151
|
+
2.6. Fair Use
|
152
|
+
|
153
|
+
This License is not intended to limit any rights You have under
|
154
|
+
applicable copyright doctrines of fair use, fair dealing, or other
|
155
|
+
equivalents.
|
156
|
+
|
157
|
+
2.7. Conditions
|
158
|
+
|
159
|
+
Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted in
|
160
|
+
Section 2.1.
|
161
|
+
|
162
|
+
|
163
|
+
3. Responsibilities
|
164
|
+
|
165
|
+
3.1. Distribution of Source Form
|
166
|
+
|
167
|
+
All distribution of Covered Software in Source Code Form, including any
|
168
|
+
Modifications that You create or to which You contribute, must be under
|
169
|
+
the terms of this License. You must inform recipients that the Source
|
170
|
+
Code Form of the Covered Software is governed by the terms of this
|
171
|
+
License, and how they can obtain a copy of this License. You may not
|
172
|
+
attempt to alter or restrict the recipients' rights in the Source Code
|
173
|
+
Form.
|
174
|
+
|
175
|
+
3.2. Distribution of Executable Form
|
176
|
+
|
177
|
+
If You distribute Covered Software in Executable Form then:
|
178
|
+
|
179
|
+
a. such Covered Software must also be made available in Source Code Form,
|
180
|
+
as described in Section 3.1, and You must inform recipients of the
|
181
|
+
Executable Form how they can obtain a copy of such Source Code Form by
|
182
|
+
reasonable means in a timely manner, at a charge no more than the cost
|
183
|
+
of distribution to the recipient; and
|
184
|
+
|
185
|
+
b. You may distribute such Executable Form under the terms of this
|
186
|
+
License, or sublicense it under different terms, provided that the
|
187
|
+
license for the Executable Form does not attempt to limit or alter the
|
188
|
+
recipients' rights in the Source Code Form under this License.
|
189
|
+
|
190
|
+
3.3. Distribution of a Larger Work
|
191
|
+
|
192
|
+
You may create and distribute a Larger Work under terms of Your choice,
|
193
|
+
provided that You also comply with the requirements of this License for
|
194
|
+
the Covered Software. If the Larger Work is a combination of Covered
|
195
|
+
Software with a work governed by one or more Secondary Licenses, and the
|
196
|
+
Covered Software is not Incompatible With Secondary Licenses, this
|
197
|
+
License permits You to additionally distribute such Covered Software
|
198
|
+
under the terms of such Secondary License(s), so that the recipient of
|
199
|
+
the Larger Work may, at their option, further distribute the Covered
|
200
|
+
Software under the terms of either this License or such Secondary
|
201
|
+
License(s).
|
202
|
+
|
203
|
+
3.4. Notices
|
204
|
+
|
205
|
+
You may not remove or alter the substance of any license notices
|
206
|
+
(including copyright notices, patent notices, disclaimers of warranty, or
|
207
|
+
limitations of liability) contained within the Source Code Form of the
|
208
|
+
Covered Software, except that You may alter any license notices to the
|
209
|
+
extent required to remedy known factual inaccuracies.
|
210
|
+
|
211
|
+
3.5. Application of Additional Terms
|
212
|
+
|
213
|
+
You may choose to offer, and to charge a fee for, warranty, support,
|
214
|
+
indemnity or liability obligations to one or more recipients of Covered
|
215
|
+
Software. However, You may do so only on Your own behalf, and not on
|
216
|
+
behalf of any Contributor. You must make it absolutely clear that any
|
217
|
+
such warranty, support, indemnity, or liability obligation is offered by
|
218
|
+
You alone, and You hereby agree to indemnify every Contributor for any
|
219
|
+
liability incurred by such Contributor as a result of warranty, support,
|
220
|
+
indemnity or liability terms You offer. You may include additional
|
221
|
+
disclaimers of warranty and limitations of liability specific to any
|
222
|
+
jurisdiction.
|
223
|
+
|
224
|
+
4. Inability to Comply Due to Statute or Regulation
|
225
|
+
|
226
|
+
If it is impossible for You to comply with any of the terms of this License
|
227
|
+
with respect to some or all of the Covered Software due to statute,
|
228
|
+
judicial order, or regulation then You must: (a) comply with the terms of
|
229
|
+
this License to the maximum extent possible; and (b) describe the
|
230
|
+
limitations and the code they affect. Such description must be placed in a
|
231
|
+
text file included with all distributions of the Covered Software under
|
232
|
+
this License. Except to the extent prohibited by statute or regulation,
|
233
|
+
such description must be sufficiently detailed for a recipient of ordinary
|
234
|
+
skill to be able to understand it.
|
235
|
+
|
236
|
+
5. Termination
|
237
|
+
|
238
|
+
5.1. The rights granted under this License will terminate automatically if You
|
239
|
+
fail to comply with any of its terms. However, if You become compliant,
|
240
|
+
then the rights granted under this License from a particular Contributor
|
241
|
+
are reinstated (a) provisionally, unless and until such Contributor
|
242
|
+
explicitly and finally terminates Your grants, and (b) on an ongoing
|
243
|
+
basis, if such Contributor fails to notify You of the non-compliance by
|
244
|
+
some reasonable means prior to 60 days after You have come back into
|
245
|
+
compliance. Moreover, Your grants from a particular Contributor are
|
246
|
+
reinstated on an ongoing basis if such Contributor notifies You of the
|
247
|
+
non-compliance by some reasonable means, this is the first time You have
|
248
|
+
received notice of non-compliance with this License from such
|
249
|
+
Contributor, and You become compliant prior to 30 days after Your receipt
|
250
|
+
of the notice.
|
251
|
+
|
252
|
+
5.2. If You initiate litigation against any entity by asserting a patent
|
253
|
+
infringement claim (excluding declaratory judgment actions,
|
254
|
+
counter-claims, and cross-claims) alleging that a Contributor Version
|
255
|
+
directly or indirectly infringes any patent, then the rights granted to
|
256
|
+
You by any and all Contributors for the Covered Software under Section
|
257
|
+
2.1 of this License shall terminate.
|
258
|
+
|
259
|
+
5.3. In the event of termination under Sections 5.1 or 5.2 above, all end user
|
260
|
+
license agreements (excluding distributors and resellers) which have been
|
261
|
+
validly granted by You or Your distributors under this License prior to
|
262
|
+
termination shall survive termination.
|
263
|
+
|
264
|
+
6. Disclaimer of Warranty
|
265
|
+
|
266
|
+
Covered Software is provided under this License on an "as is" basis,
|
267
|
+
without warranty of any kind, either expressed, implied, or statutory,
|
268
|
+
including, without limitation, warranties that the Covered Software is free
|
269
|
+
of defects, merchantable, fit for a particular purpose or non-infringing.
|
270
|
+
The entire risk as to the quality and performance of the Covered Software
|
271
|
+
is with You. Should any Covered Software prove defective in any respect,
|
272
|
+
You (not any Contributor) assume the cost of any necessary servicing,
|
273
|
+
repair, or correction. This disclaimer of warranty constitutes an essential
|
274
|
+
part of this License. No use of any Covered Software is authorized under
|
275
|
+
this License except under this disclaimer.
|
276
|
+
|
277
|
+
7. Limitation of Liability
|
278
|
+
|
279
|
+
Under no circumstances and under no legal theory, whether tort (including
|
280
|
+
negligence), contract, or otherwise, shall any Contributor, or anyone who
|
281
|
+
distributes Covered Software as permitted above, be liable to You for any
|
282
|
+
direct, indirect, special, incidental, or consequential damages of any
|
283
|
+
character including, without limitation, damages for lost profits, loss of
|
284
|
+
goodwill, work stoppage, computer failure or malfunction, or any and all
|
285
|
+
other commercial damages or losses, even if such party shall have been
|
286
|
+
informed of the possibility of such damages. This limitation of liability
|
287
|
+
shall not apply to liability for death or personal injury resulting from
|
288
|
+
such party's negligence to the extent applicable law prohibits such
|
289
|
+
limitation. Some jurisdictions do not allow the exclusion or limitation of
|
290
|
+
incidental or consequential damages, so this exclusion and limitation may
|
291
|
+
not apply to You.
|
292
|
+
|
293
|
+
8. Litigation
|
294
|
+
|
295
|
+
Any litigation relating to this License may be brought only in the courts
|
296
|
+
of a jurisdiction where the defendant maintains its principal place of
|
297
|
+
business and such litigation shall be governed by laws of that
|
298
|
+
jurisdiction, without reference to its conflict-of-law provisions. Nothing
|
299
|
+
in this Section shall prevent a party's ability to bring cross-claims or
|
300
|
+
counter-claims.
|
301
|
+
|
302
|
+
9. Miscellaneous
|
303
|
+
|
304
|
+
This License represents the complete agreement concerning the subject
|
305
|
+
matter hereof. If any provision of this License is held to be
|
306
|
+
unenforceable, such provision shall be reformed only to the extent
|
307
|
+
necessary to make it enforceable. Any law or regulation which provides that
|
308
|
+
the language of a contract shall be construed against the drafter shall not
|
309
|
+
be used to construe this License against a Contributor.
|
310
|
+
|
311
|
+
|
312
|
+
10. Versions of the License
|
313
|
+
|
314
|
+
10.1. New Versions
|
315
|
+
|
316
|
+
Mozilla Foundation is the license steward. Except as provided in Section
|
317
|
+
10.3, no one other than the license steward has the right to modify or
|
318
|
+
publish new versions of this License. Each version will be given a
|
319
|
+
distinguishing version number.
|
320
|
+
|
321
|
+
10.2. Effect of New Versions
|
322
|
+
|
323
|
+
You may distribute the Covered Software under the terms of the version
|
324
|
+
of the License under which You originally received the Covered Software,
|
325
|
+
or under the terms of any subsequent version published by the license
|
326
|
+
steward.
|
327
|
+
|
328
|
+
10.3. Modified Versions
|
329
|
+
|
330
|
+
If you create software not governed by this License, and you want to
|
331
|
+
create a new license for such software, you may create and use a
|
332
|
+
modified version of this License if you rename the license and remove
|
333
|
+
any references to the name of the license steward (except to note that
|
334
|
+
such modified license differs from this License).
|
335
|
+
|
336
|
+
10.4. Distributing Source Code Form that is Incompatible With Secondary
|
337
|
+
Licenses If You choose to distribute Source Code Form that is
|
338
|
+
Incompatible With Secondary Licenses under the terms of this version of
|
339
|
+
the License, the notice described in Exhibit B of this License must be
|
340
|
+
attached.
|
341
|
+
|
342
|
+
Exhibit A - Source Code Form License Notice
|
343
|
+
|
344
|
+
This Source Code Form is subject to the
|
345
|
+
terms of the Mozilla Public License, v.
|
346
|
+
2.0. If a copy of the MPL was not
|
347
|
+
distributed with this file, You can
|
348
|
+
obtain one at
|
349
|
+
http://mozilla.org/MPL/2.0/.
|
350
|
+
|
351
|
+
If it is not possible or desirable to put the notice in a particular file,
|
352
|
+
then You may include the notice in a location (such as a LICENSE file in a
|
353
|
+
relevant directory) where a recipient would be likely to look for such a
|
354
|
+
notice.
|
355
|
+
|
356
|
+
You may add additional accurate notices of copyright ownership.
|
357
|
+
|
358
|
+
Exhibit B - "Incompatible With Secondary Licenses" Notice
|
359
|
+
|
360
|
+
This Source Code Form is "Incompatible
|
361
|
+
With Secondary Licenses", as defined by
|
362
|
+
the Mozilla Public License, v. 2.0.
|
data/README.md
ADDED
@@ -0,0 +1,317 @@
|
|
1
|
+
# Oga
|
2
|
+
|
3
|
+
Oga is an XML/HTML parser written in Ruby. It provides an easy to use API for
|
4
|
+
parsing, modifying and querying documents (using XPath expressions). Oga does
|
5
|
+
not require system libraries such as libxml, making it easier and faster to
|
6
|
+
install on various platforms. To achieve better performance Oga uses a small,
|
7
|
+
native extension (C for MRI/Rubinius, Java for JRuby).
|
8
|
+
|
9
|
+
Oga provides an API that allows you to safely parse and query documents in a
|
10
|
+
multi-threaded environment, without having to worry about your applications
|
11
|
+
blowing up.
|
12
|
+
|
13
|
+
From [Wikipedia][oga-wikipedia]:
|
14
|
+
|
15
|
+
> Oga: A large two-person saw used for ripping large boards in the days before
|
16
|
+
> power saws. One person stood on a raised platform, with the board below him,
|
17
|
+
> and the other person stood underneath them.
|
18
|
+
|
19
|
+
The name is a pun on [Nokogiri][nokogiri].
|
20
|
+
|
21
|
+
Oga uses [Semantic Versioning 2.0][semver] as its versioning scheme. All
|
22
|
+
classes, modules and methods are part of the public API _unless_ they are
|
23
|
+
declared as private using Ruby's `private` keyword or YARD's `@api private` tag.
|
24
|
+
|
25
|
+
## Examples
|
26
|
+
|
27
|
+
Parsing a simple string of XML:
|
28
|
+
|
29
|
+
Oga.parse_xml('<people><person>Alice</person></people>')
|
30
|
+
|
31
|
+
Parsing XML using strict mode (disables automatic tag insertion):
|
32
|
+
|
33
|
+
Oga.parse_xml('<people>foo</people>', :strict => true) # works fine
|
34
|
+
Oga.parse_xml('<people>foo', :strict => true) # throws an error
|
35
|
+
|
36
|
+
Parsing a simple string of HTML:
|
37
|
+
|
38
|
+
Oga.parse_html('<link rel="stylesheet" href="foo.css">')
|
39
|
+
|
40
|
+
Parsing an IO handle pointing to XML (this also works when using
|
41
|
+
`Oga.parse_html`):
|
42
|
+
|
43
|
+
handle = File.open('path/to/file.xml')
|
44
|
+
|
45
|
+
Oga.parse_xml(handle)
|
46
|
+
|
47
|
+
Parsing an IO handle using the pull parser:
|
48
|
+
|
49
|
+
handle = File.open('path/to/file.xml')
|
50
|
+
parser = Oga::XML::PullParser.new(handle)
|
51
|
+
|
52
|
+
parser.parse do |node|
|
53
|
+
parser.on(:text) do
|
54
|
+
puts node.text
|
55
|
+
end
|
56
|
+
end
|
57
|
+
|
58
|
+
Using an Enumerator to download and parse an XML document on the fly:
|
59
|
+
|
60
|
+
enum = Enumerator.new do |yielder|
|
61
|
+
HTTPClient.get('http://some-website.com/some-big-file.xml') do |chunk|
|
62
|
+
yielder << chunk
|
63
|
+
end
|
64
|
+
end
|
65
|
+
|
66
|
+
document = Oga.parse_xml(enum)
|
67
|
+
|
68
|
+
Parse a string of XML using the SAX parser:
|
69
|
+
|
70
|
+
class ElementNames
|
71
|
+
attr_reader :names
|
72
|
+
|
73
|
+
def initialize
|
74
|
+
@names = []
|
75
|
+
end
|
76
|
+
|
77
|
+
def on_element(namespace, name, attrs = {})
|
78
|
+
@names << name
|
79
|
+
end
|
80
|
+
end
|
81
|
+
|
82
|
+
handler = ElementNames.new
|
83
|
+
|
84
|
+
Oga.sax_parse_xml(handler, '<foo><bar></bar></foo>')
|
85
|
+
|
86
|
+
handler.names # => ["foo", "bar"]
|
87
|
+
|
88
|
+
Querying a document using XPath:
|
89
|
+
|
90
|
+
document = Oga.parse_xml <<-EOF
|
91
|
+
<people>
|
92
|
+
<person id="1">
|
93
|
+
<name>Alice</name>
|
94
|
+
<age>28</name>
|
95
|
+
</person>
|
96
|
+
</people>
|
97
|
+
EOF
|
98
|
+
|
99
|
+
# The "xpath" method returns an enumerable (Oga::XML::NodeSet) that you can
|
100
|
+
# iterate over.
|
101
|
+
document.xpath('people/person').each do |person|
|
102
|
+
puts person.get('id') # => "1"
|
103
|
+
|
104
|
+
# The "at_xpath" method returns a single node from a set, it's the same as
|
105
|
+
# person.xpath('name').first.
|
106
|
+
puts person.at_xpath('name').text # => "Alice"
|
107
|
+
end
|
108
|
+
|
109
|
+
Querying the same document using CSS:
|
110
|
+
|
111
|
+
document = Oga.parse_xml <<-EOF
|
112
|
+
<people>
|
113
|
+
<person id="1">
|
114
|
+
<name>Alice</name>
|
115
|
+
<age>28</name>
|
116
|
+
</person>
|
117
|
+
</people>
|
118
|
+
EOF
|
119
|
+
|
120
|
+
# The "css" method returns an enumerable (Oga::XML::NodeSet) that you can
|
121
|
+
# iterate over.
|
122
|
+
document.css('people person').each do |person|
|
123
|
+
puts person.get('id') # => "1"
|
124
|
+
|
125
|
+
# The "at_css" method returns a single node from a set, it's the same as
|
126
|
+
# person.css('name').first.
|
127
|
+
puts person.at_css('name').text # => "Alice"
|
128
|
+
end
|
129
|
+
|
130
|
+
Modifying a document and serializing it back to XML:
|
131
|
+
|
132
|
+
document = Oga.parse_xml('<people><person>Alice</person></people>')
|
133
|
+
name = document.at_xpath('people/person[1]/text()')
|
134
|
+
|
135
|
+
name.text = 'Bob'
|
136
|
+
|
137
|
+
document.to_xml # => "<people><person>Bob</person></people>"
|
138
|
+
|
139
|
+
Querying a document using a namespace:
|
140
|
+
|
141
|
+
document = Oga.parse_xml('<root xmlns:x="foo"><x:div></x:div></root>')
|
142
|
+
div = document.xpath('root/x:div').first
|
143
|
+
|
144
|
+
div.namespace # => Namespace(name: "x" uri: "foo")
|
145
|
+
|
146
|
+
## Features
|
147
|
+
|
148
|
+
* Support for parsing XML and HTML(5)
|
149
|
+
* DOM parsing
|
150
|
+
* Stream/pull parsing
|
151
|
+
* SAX parsing
|
152
|
+
* Low memory footprint
|
153
|
+
* High performance, if something doesn't perform well enough it's a bug
|
154
|
+
* Support for XPath 1.0
|
155
|
+
* CSS3 selector support
|
156
|
+
* XML namespace support (registering, querying, etc)
|
157
|
+
|
158
|
+
## Requirements
|
159
|
+
|
160
|
+
| Ruby | Required | Recommended |
|
161
|
+
|:---------|:--------------|:------------|
|
162
|
+
| MRI | >= 1.9.3 | >= 2.1.2 |
|
163
|
+
| Rubinius | >= 2.2 | >= 2.2.10 |
|
164
|
+
| JRuby | >= 1.7 | >= 1.7.12 |
|
165
|
+
| Maglev | Not supported | |
|
166
|
+
| Topaz | Not supported | |
|
167
|
+
| mruby | Not supported | |
|
168
|
+
|
169
|
+
Maglev and Topaz are not supported due to the lack of a C API (that I know of)
|
170
|
+
and the lack of active development of these Ruby implementations. mruby is not
|
171
|
+
supported because it's a very different implementation all together.
|
172
|
+
|
173
|
+
To install Oga on MRI or Rubinius you'll need to have a working compiler such as
|
174
|
+
gcc or clang. Oga's C extension can be compiled with both. JRuby does not
|
175
|
+
require a compiler as the native extension is compiled during the Gem building
|
176
|
+
process and bundled inside the Gem itself.
|
177
|
+
|
178
|
+
## Thread Safety
|
179
|
+
|
180
|
+
Documents parsed using Oga are thread-safe as long as they are not modified by
|
181
|
+
multiple threads at the same time. Querying documents using XPath can be done by
|
182
|
+
multiple threads just fine. Write operations, such as removing attributes, are
|
183
|
+
_not_ thread-safe and should not be done by multiple threads at once.
|
184
|
+
|
185
|
+
It is advised that you do not share parsed documents between threads unless you
|
186
|
+
_really_ have to.
|
187
|
+
|
188
|
+
## Namespace Support
|
189
|
+
|
190
|
+
Oga fully supports parsing/registering XML namespaces as well as querying them
|
191
|
+
using XPath. For example, take the following XML:
|
192
|
+
|
193
|
+
<root xmlns="http://example.com">
|
194
|
+
<bar>bar</bar>
|
195
|
+
</root>
|
196
|
+
|
197
|
+
If one were to try and query the `bar` element (e.g. using XPath `root/bar`)
|
198
|
+
they'd end up with an empty node set. This is due to `<root>` defining an
|
199
|
+
alternative default namespace. Instead you can query this element using the
|
200
|
+
following XPath:
|
201
|
+
|
202
|
+
*[local-name() = "root"]/*[local-name() = "bar"]
|
203
|
+
|
204
|
+
Alternatively, if you don't really care where the `<bar>` element is located you
|
205
|
+
can use the following:
|
206
|
+
|
207
|
+
descendant::*[local-name() = "bar"]
|
208
|
+
|
209
|
+
And if you want to specify an explici namespace URI, you can use this:
|
210
|
+
|
211
|
+
descendant::*[local-name() = "bar" and namespace-uri() = "http://example.com"]
|
212
|
+
|
213
|
+
Unlike Nokogiri, Oga does _not_ provide a way to create "dynamic" namespaces.
|
214
|
+
That is, Nokogiri allows one to query the above document as following:
|
215
|
+
|
216
|
+
document = Nokogiri::XML('<root xmlns="http://example.com"><bar>bar</bar></root>')
|
217
|
+
|
218
|
+
document.xpath('x:root/x:bar', :x => 'http://example.com')
|
219
|
+
|
220
|
+
Oga does have a small trick you can use to cut down the size of your XPath
|
221
|
+
queries. Because Oga assigns the name "xmlns" to default namespaces you can use
|
222
|
+
this in your XPath queries:
|
223
|
+
|
224
|
+
document = Oga.parse_xml('<root xmlns="http://example.com"><bar>bar</bar></root>')
|
225
|
+
|
226
|
+
document.xpath('xmlns:root/xmlns:bar')
|
227
|
+
|
228
|
+
When using this you can still restrict the query to the correct namespace URI:
|
229
|
+
|
230
|
+
document.xpath('xmlns:root[namespace-uri() = "http://example.com"]/xmlns:bar')
|
231
|
+
|
232
|
+
In the future I might add an API to ease this process, although at this time I
|
233
|
+
have little interest in providing an API similar to Nokogiri.
|
234
|
+
|
235
|
+
## HTML5 Support
|
236
|
+
|
237
|
+
Oga fully supports HTML5 including the omission of certain tags. For example,
|
238
|
+
the following is parsed just fine:
|
239
|
+
|
240
|
+
<li>Hello
|
241
|
+
<li>World
|
242
|
+
|
243
|
+
This is effectively parsed into:
|
244
|
+
|
245
|
+
<li>Hello</li>
|
246
|
+
<li>World</li>
|
247
|
+
|
248
|
+
One exception Oga makes is that it does _not_ automatically insert `html`,
|
249
|
+
`head` and `body` tags. Automatically inserting these tags requires a
|
250
|
+
distinction between documents and fragments as a user might not always want
|
251
|
+
these tags to be inserted if left out. This complicates the user facing API as
|
252
|
+
well as complicating the parsing internals of Oga. As a result I have decided
|
253
|
+
that Oga _does not_ insert these tags when left out.
|
254
|
+
|
255
|
+
A more in depth explanation can be found here:
|
256
|
+
<https://github.com/YorickPeterse/oga/issues/98#issuecomment-96833066>.
|
257
|
+
|
258
|
+
## Documentation
|
259
|
+
|
260
|
+
The documentation is best viewed [on the documentation website][doc-website].
|
261
|
+
|
262
|
+
* {file:CONTRIBUTING Contributing}
|
263
|
+
* {file:changelog Changelog}
|
264
|
+
* {file:migrating\_from\_nokogiri Migrating From Nokogiri}
|
265
|
+
* {Oga::XML::Parser XML Parser}
|
266
|
+
* {Oga::XML::SaxParser XML SAX Parser}
|
267
|
+
* {file:xml\_namespaces XML Namespaces}
|
268
|
+
|
269
|
+
## Why Another HTML/XML parser?
|
270
|
+
|
271
|
+
Currently there are a few existing parser out there, the most famous one being
|
272
|
+
[Nokogiri][nokogiri]. Another parser that's becoming more popular these days is
|
273
|
+
[Ox][ox]. Ruby's standard library also comes with REXML.
|
274
|
+
|
275
|
+
The sad truth is that these existing libraries are problematic in their own
|
276
|
+
ways. Nokogiri for example is extremely unstable on Rubinius. On MRI it works
|
277
|
+
because of the non conccurent nature of MRI, on JRuby it works because it's
|
278
|
+
implemented as Java. Nokogiri also uses libxml2 which is a massive beast of a
|
279
|
+
library, is not thread-safe and problematic to install on certain platforms
|
280
|
+
(apparently). I don't want to compile libxml2 every time I install Nokogiri
|
281
|
+
either.
|
282
|
+
|
283
|
+
To give an example about the issues with Nokogiri on Rubinius (or any other
|
284
|
+
Ruby implementation that is not MRI or JRuby), take a look at these issues:
|
285
|
+
|
286
|
+
* <https://github.com/rubinius/rubinius/issues/2957>
|
287
|
+
* <https://github.com/rubinius/rubinius/issues/2908>
|
288
|
+
* <https://github.com/rubinius/rubinius/issues/2462>
|
289
|
+
* <https://github.com/sparklemotion/nokogiri/issues/1047>
|
290
|
+
* <https://github.com/sparklemotion/nokogiri/issues/939>
|
291
|
+
|
292
|
+
Some of these have been fixed, some have not. The core problem remains:
|
293
|
+
Nokogiri acts in a way that there can be a large number of places where it
|
294
|
+
*might* break due to throwing around void pointers and what not and expecting
|
295
|
+
that things magically work. Note that I have nothing against the people running
|
296
|
+
these projects, I just heavily, *heavily* dislike the resulting codebase one
|
297
|
+
has to deal with today.
|
298
|
+
|
299
|
+
Ox looks very promising but it lacks a rather crucial feature: parsing HTML
|
300
|
+
(without using a SAX API). It's also again a C extension making debugging more
|
301
|
+
of a pain (at least for me).
|
302
|
+
|
303
|
+
I just want an XML/HTML parser that I can rely on stability wise and that is
|
304
|
+
written in Ruby so I can actually debug it. In theory it should also make it
|
305
|
+
easier for other Ruby developers to contribute.
|
306
|
+
|
307
|
+
## License
|
308
|
+
|
309
|
+
All source code in this repository is subject to the terms of the Mozilla Public
|
310
|
+
License, version 2.0 unless stated otherwise. A copy of this license can be
|
311
|
+
found the file "LICENSE" or at <https://www.mozilla.org/MPL/2.0/>.
|
312
|
+
|
313
|
+
[nokogiri]: https://github.com/sparklemotion/nokogiri
|
314
|
+
[oga-wikipedia]: https://en.wikipedia.org/wiki/Japanese_saw#Other_Japanese_saws
|
315
|
+
[ox]: https://github.com/ohler55/ox
|
316
|
+
[doc-website]: http://code.yorickpeterse.com/oga/latest/
|
317
|
+
[semver]: http://semver.org/spec/v2.0.0.html
|