nokogiri 1.0.0 → 1.6.8.1
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of nokogiri might be problematic. Click here for more details.
- checksums.yaml +7 -0
- data/.autotest +26 -0
- data/.cross_rubies +9 -0
- data/.editorconfig +17 -0
- data/.gemtest +0 -0
- data/.travis.yml +51 -0
- data/CHANGELOG.rdoc +1160 -0
- data/CONTRIBUTING.md +42 -0
- data/C_CODING_STYLE.rdoc +33 -0
- data/Gemfile +22 -0
- data/LICENSE.txt +31 -0
- data/Manifest.txt +284 -40
- data/README.md +166 -0
- data/ROADMAP.md +111 -0
- data/Rakefile +310 -199
- data/STANDARD_RESPONSES.md +47 -0
- data/Y_U_NO_GEMSPEC.md +155 -0
- data/appveyor.yml +22 -0
- data/bin/nokogiri +118 -0
- data/build_all +45 -0
- data/dependencies.yml +29 -0
- data/ext/nokogiri/depend +358 -0
- data/ext/nokogiri/extconf.rb +664 -34
- data/ext/nokogiri/html_document.c +120 -33
- data/ext/nokogiri/html_document.h +1 -1
- data/ext/nokogiri/html_element_description.c +279 -0
- data/ext/nokogiri/html_element_description.h +10 -0
- data/ext/nokogiri/html_entity_lookup.c +32 -0
- data/ext/nokogiri/html_entity_lookup.h +8 -0
- data/ext/nokogiri/html_sax_parser_context.c +116 -0
- data/ext/nokogiri/html_sax_parser_context.h +11 -0
- data/ext/nokogiri/html_sax_push_parser.c +87 -0
- data/ext/nokogiri/html_sax_push_parser.h +9 -0
- data/ext/nokogiri/nokogiri.c +145 -0
- data/ext/nokogiri/nokogiri.h +131 -0
- data/ext/nokogiri/xml_attr.c +94 -0
- data/ext/nokogiri/xml_attr.h +9 -0
- data/ext/nokogiri/xml_attribute_decl.c +70 -0
- data/ext/nokogiri/xml_attribute_decl.h +9 -0
- data/ext/nokogiri/xml_cdata.c +23 -19
- data/ext/nokogiri/xml_cdata.h +1 -1
- data/ext/nokogiri/xml_comment.c +69 -0
- data/ext/nokogiri/xml_comment.h +9 -0
- data/ext/nokogiri/xml_document.c +501 -54
- data/ext/nokogiri/xml_document.h +14 -1
- data/ext/nokogiri/xml_document_fragment.c +48 -0
- data/ext/nokogiri/xml_document_fragment.h +10 -0
- data/ext/nokogiri/xml_dtd.c +109 -24
- data/ext/nokogiri/xml_dtd.h +3 -1
- data/ext/nokogiri/xml_element_content.c +123 -0
- data/ext/nokogiri/xml_element_content.h +10 -0
- data/ext/nokogiri/xml_element_decl.c +69 -0
- data/ext/nokogiri/xml_element_decl.h +9 -0
- data/ext/nokogiri/xml_encoding_handler.c +79 -0
- data/ext/nokogiri/xml_encoding_handler.h +8 -0
- data/ext/nokogiri/xml_entity_decl.c +110 -0
- data/ext/nokogiri/xml_entity_decl.h +10 -0
- data/ext/nokogiri/xml_entity_reference.c +52 -0
- data/ext/nokogiri/xml_entity_reference.h +9 -0
- data/ext/nokogiri/xml_io.c +60 -0
- data/ext/nokogiri/xml_io.h +11 -0
- data/ext/nokogiri/xml_libxml2_hacks.c +112 -0
- data/ext/nokogiri/xml_libxml2_hacks.h +12 -0
- data/ext/nokogiri/xml_namespace.c +117 -0
- data/ext/nokogiri/xml_namespace.h +13 -0
- data/ext/nokogiri/xml_node.c +1285 -315
- data/ext/nokogiri/xml_node.h +4 -6
- data/ext/nokogiri/xml_node_set.c +415 -54
- data/ext/nokogiri/xml_node_set.h +6 -2
- data/ext/nokogiri/xml_processing_instruction.c +56 -0
- data/ext/nokogiri/xml_processing_instruction.h +9 -0
- data/ext/nokogiri/xml_reader.c +316 -77
- data/ext/nokogiri/xml_reader.h +1 -1
- data/ext/nokogiri/xml_relax_ng.c +161 -0
- data/ext/nokogiri/xml_relax_ng.h +9 -0
- data/ext/nokogiri/xml_sax_parser.c +215 -80
- data/ext/nokogiri/xml_sax_parser.h +30 -1
- data/ext/nokogiri/xml_sax_parser_context.c +262 -0
- data/ext/nokogiri/xml_sax_parser_context.h +10 -0
- data/ext/nokogiri/xml_sax_push_parser.c +115 -0
- data/ext/nokogiri/xml_sax_push_parser.h +9 -0
- data/ext/nokogiri/xml_schema.c +205 -0
- data/ext/nokogiri/xml_schema.h +9 -0
- data/ext/nokogiri/xml_syntax_error.c +45 -175
- data/ext/nokogiri/xml_syntax_error.h +4 -2
- data/ext/nokogiri/xml_text.c +37 -14
- data/ext/nokogiri/xml_text.h +1 -1
- data/ext/nokogiri/xml_xpath_context.c +230 -13
- data/ext/nokogiri/xml_xpath_context.h +2 -1
- data/ext/nokogiri/xslt_stylesheet.c +196 -34
- data/ext/nokogiri/xslt_stylesheet.h +6 -1
- data/lib/nokogiri/css/node.rb +18 -61
- data/lib/nokogiri/css/parser.rb +725 -17
- data/lib/nokogiri/css/parser.y +126 -63
- data/lib/nokogiri/css/parser_extras.rb +91 -0
- data/lib/nokogiri/css/syntax_error.rb +7 -0
- data/lib/nokogiri/css/tokenizer.rb +148 -5
- data/lib/nokogiri/css/tokenizer.rex +31 -39
- data/lib/nokogiri/css/xpath_visitor.rb +109 -51
- data/lib/nokogiri/css.rb +24 -3
- data/lib/nokogiri/decorators/slop.rb +42 -0
- data/lib/nokogiri/html/builder.rb +27 -1
- data/lib/nokogiri/html/document.rb +329 -3
- data/lib/nokogiri/html/document_fragment.rb +39 -0
- data/lib/nokogiri/html/element_description.rb +23 -0
- data/lib/nokogiri/html/element_description_defaults.rb +671 -0
- data/lib/nokogiri/html/entity_lookup.rb +13 -0
- data/lib/nokogiri/html/sax/parser.rb +35 -4
- data/lib/nokogiri/html/sax/parser_context.rb +16 -0
- data/lib/nokogiri/html/sax/push_parser.rb +36 -0
- data/lib/nokogiri/html.rb +18 -76
- data/lib/nokogiri/syntax_error.rb +4 -0
- data/lib/nokogiri/version.rb +106 -1
- data/lib/nokogiri/xml/attr.rb +14 -0
- data/lib/nokogiri/xml/attribute_decl.rb +18 -0
- data/lib/nokogiri/xml/builder.rb +395 -31
- data/lib/nokogiri/xml/cdata.rb +4 -2
- data/lib/nokogiri/xml/character_data.rb +7 -0
- data/lib/nokogiri/xml/document.rb +267 -12
- data/lib/nokogiri/xml/document_fragment.rb +149 -0
- data/lib/nokogiri/xml/dtd.rb +27 -1
- data/lib/nokogiri/xml/element_content.rb +36 -0
- data/lib/nokogiri/xml/element_decl.rb +13 -0
- data/lib/nokogiri/xml/entity_decl.rb +19 -0
- data/lib/nokogiri/xml/namespace.rb +13 -0
- data/lib/nokogiri/xml/node/save_options.rb +61 -0
- data/lib/nokogiri/xml/node.rb +748 -109
- data/lib/nokogiri/xml/node_set.rb +200 -72
- data/lib/nokogiri/xml/parse_options.rb +120 -0
- data/lib/nokogiri/xml/pp/character_data.rb +18 -0
- data/lib/nokogiri/xml/pp/node.rb +56 -0
- data/lib/nokogiri/xml/pp.rb +2 -0
- data/lib/nokogiri/xml/processing_instruction.rb +8 -0
- data/lib/nokogiri/xml/reader.rb +102 -4
- data/lib/nokogiri/xml/relax_ng.rb +32 -0
- data/lib/nokogiri/xml/sax/document.rb +114 -2
- data/lib/nokogiri/xml/sax/parser.rb +97 -7
- data/lib/nokogiri/xml/sax/parser_context.rb +16 -0
- data/lib/nokogiri/xml/sax/push_parser.rb +60 -0
- data/lib/nokogiri/xml/sax.rb +2 -7
- data/lib/nokogiri/xml/schema.rb +63 -0
- data/lib/nokogiri/xml/searchable.rb +221 -0
- data/lib/nokogiri/xml/syntax_error.rb +27 -1
- data/lib/nokogiri/xml/text.rb +4 -1
- data/lib/nokogiri/xml/xpath/syntax_error.rb +11 -0
- data/lib/nokogiri/xml/xpath.rb +4 -0
- data/lib/nokogiri/xml/xpath_context.rb +3 -1
- data/lib/nokogiri/xml.rb +45 -38
- data/lib/nokogiri/xslt/stylesheet.rb +19 -0
- data/lib/nokogiri/xslt.rb +47 -2
- data/lib/nokogiri.rb +117 -24
- data/lib/xsd/xmlparser/nokogiri.rb +102 -0
- data/patches/sort-patches-by-date +25 -0
- data/ports/archives/libxml2-2.9.4.tar.gz +0 -0
- data/ports/archives/libxslt-1.1.29.tar.gz +0 -0
- data/suppressions/README.txt +1 -0
- data/suppressions/nokogiri_ree-1.8.7.358.supp +61 -0
- data/suppressions/nokogiri_ruby-1.8.7.370.supp +0 -0
- data/suppressions/nokogiri_ruby-1.9.2.320.supp +28 -0
- data/suppressions/nokogiri_ruby-1.9.3.327.supp +28 -0
- data/tasks/test.rb +100 -0
- data/test/css/test_nthiness.rb +73 -6
- data/test/css/test_parser.rb +184 -39
- data/test/css/test_tokenizer.rb +72 -19
- data/test/css/test_xpath_visitor.rb +44 -2
- data/test/decorators/test_slop.rb +20 -0
- data/test/files/2ch.html +108 -0
- data/test/files/GH_1042.html +18 -0
- data/test/files/address_book.rlx +12 -0
- data/test/files/address_book.xml +10 -0
- data/test/files/atom.xml +344 -0
- data/test/files/bar/bar.xsd +4 -0
- data/test/files/bogus.xml +0 -0
- data/test/files/dont_hurt_em_why.xml +422 -0
- data/test/files/encoding.html +82 -0
- data/test/files/encoding.xhtml +84 -0
- data/test/files/exslt.xml +8 -0
- data/test/files/exslt.xslt +35 -0
- data/test/files/foo/foo.xsd +4 -0
- data/test/files/metacharset.html +10 -0
- data/test/files/namespace_pressure_test.xml +1684 -0
- data/test/files/noencoding.html +47 -0
- data/test/files/po.xml +32 -0
- data/test/files/po.xsd +66 -0
- data/test/files/saml/saml20assertion_schema.xsd +283 -0
- data/test/files/saml/saml20protocol_schema.xsd +302 -0
- data/test/files/saml/xenc_schema.xsd +146 -0
- data/test/files/saml/xmldsig_schema.xsd +318 -0
- data/test/files/shift_jis.html +10 -0
- data/test/files/shift_jis.xml +5 -0
- data/test/files/shift_jis_no_charset.html +9 -0
- data/test/files/slow-xpath.xml +25509 -0
- data/test/files/snuggles.xml +3 -0
- data/test/files/staff.dtd +10 -0
- data/test/files/test_document_url/bar.xml +2 -0
- data/test/files/test_document_url/document.dtd +4 -0
- data/test/files/test_document_url/document.xml +6 -0
- data/test/files/tlm.html +2 -1
- data/test/files/to_be_xincluded.xml +2 -0
- data/test/files/valid_bar.xml +2 -0
- data/test/files/xinclude.xml +4 -0
- data/test/helper.rb +124 -13
- data/test/html/sax/test_parser.rb +118 -4
- data/test/html/sax/test_parser_context.rb +46 -0
- data/test/html/sax/test_push_parser.rb +87 -0
- data/test/html/test_builder.rb +94 -8
- data/test/html/test_document.rb +626 -11
- data/test/html/test_document_encoding.rb +145 -0
- data/test/html/test_document_fragment.rb +301 -0
- data/test/html/test_element_description.rb +105 -0
- data/test/html/test_named_characters.rb +14 -0
- data/test/html/test_node.rb +212 -0
- data/test/html/test_node_encoding.rb +85 -0
- data/test/namespaces/test_additional_namespaces_in_builder_doc.rb +14 -0
- data/test/namespaces/test_namespaces_aliased_default.rb +24 -0
- data/test/namespaces/test_namespaces_in_builder_doc.rb +75 -0
- data/test/namespaces/test_namespaces_in_cloned_doc.rb +31 -0
- data/test/namespaces/test_namespaces_in_created_doc.rb +75 -0
- data/test/namespaces/test_namespaces_in_parsed_doc.rb +80 -0
- data/test/namespaces/test_namespaces_preservation.rb +31 -0
- data/test/test_convert_xpath.rb +2 -47
- data/test/test_css_cache.rb +45 -0
- data/test/test_encoding_handler.rb +48 -0
- data/test/test_memory_leak.rb +156 -0
- data/test/test_nokogiri.rb +103 -1
- data/test/test_soap4r_sax.rb +52 -0
- data/test/test_xslt_transforms.rb +293 -8
- data/test/xml/node/test_save_options.rb +28 -0
- data/test/xml/node/test_subclass.rb +44 -0
- data/test/xml/sax/test_parser.rb +309 -8
- data/test/xml/sax/test_parser_context.rb +115 -0
- data/test/xml/sax/test_push_parser.rb +157 -0
- data/test/xml/test_attr.rb +67 -0
- data/test/xml/test_attribute_decl.rb +86 -0
- data/test/xml/test_builder.rb +327 -2
- data/test/xml/test_c14n.rb +180 -0
- data/test/xml/test_cdata.rb +32 -2
- data/test/xml/test_comment.rb +40 -0
- data/test/xml/test_document.rb +846 -35
- data/test/xml/test_document_encoding.rb +31 -0
- data/test/xml/test_document_fragment.rb +271 -0
- data/test/xml/test_dtd.rb +153 -9
- data/test/xml/test_dtd_encoding.rb +31 -0
- data/test/xml/test_element_content.rb +56 -0
- data/test/xml/test_element_decl.rb +73 -0
- data/test/xml/test_entity_decl.rb +122 -0
- data/test/xml/test_entity_reference.rb +251 -0
- data/test/xml/test_namespace.rb +96 -0
- data/test/xml/test_node.rb +1126 -105
- data/test/xml/test_node_attributes.rb +115 -0
- data/test/xml/test_node_encoding.rb +69 -0
- data/test/xml/test_node_inheritance.rb +32 -0
- data/test/xml/test_node_reparenting.rb +549 -0
- data/test/xml/test_node_set.rb +668 -9
- data/test/xml/test_parse_options.rb +64 -0
- data/test/xml/test_processing_instruction.rb +30 -0
- data/test/xml/test_reader.rb +589 -0
- data/test/xml/test_reader_encoding.rb +134 -0
- data/test/xml/test_relax_ng.rb +60 -0
- data/test/xml/test_schema.rb +142 -0
- data/test/xml/test_syntax_error.rb +30 -0
- data/test/xml/test_text.rb +49 -2
- data/test/xml/test_unparented_node.rb +440 -0
- data/test/xml/test_xinclude.rb +83 -0
- data/test/xml/test_xpath.rb +445 -0
- data/test/xslt/test_custom_functions.rb +133 -0
- data/test/xslt/test_exception_handling.rb +37 -0
- data/test_all +107 -0
- metadata +459 -115
- data/History.txt +0 -6
- data/README.ja.txt +0 -86
- data/README.txt +0 -87
- data/ext/nokogiri/html_sax_parser.c +0 -32
- data/ext/nokogiri/html_sax_parser.h +0 -11
- data/ext/nokogiri/native.c +0 -40
- data/ext/nokogiri/native.h +0 -51
- data/ext/nokogiri/xml_xpath.c +0 -46
- data/ext/nokogiri/xml_xpath.h +0 -11
- data/lib/nokogiri/css/generated_parser.rb +0 -653
- data/lib/nokogiri/css/generated_tokenizer.rb +0 -159
- data/lib/nokogiri/decorators/hpricot/node.rb +0 -58
- data/lib/nokogiri/decorators/hpricot/node_set.rb +0 -14
- data/lib/nokogiri/decorators/hpricot/xpath_visitor.rb +0 -17
- data/lib/nokogiri/decorators/hpricot.rb +0 -3
- data/lib/nokogiri/decorators.rb +0 -1
- data/lib/nokogiri/hpricot.rb +0 -47
- data/lib/nokogiri/xml/after_handler.rb +0 -18
- data/lib/nokogiri/xml/before_handler.rb +0 -32
- data/lib/nokogiri/xml/element.rb +0 -6
- data/lib/nokogiri/xml/entity_declaration.rb +0 -9
- data/nokogiri.gemspec +0 -34
- data/test/hpricot/files/basic.xhtml +0 -17
- data/test/hpricot/files/boingboing.html +0 -2266
- data/test/hpricot/files/cy0.html +0 -3653
- data/test/hpricot/files/immob.html +0 -400
- data/test/hpricot/files/pace_application.html +0 -1320
- data/test/hpricot/files/tenderlove.html +0 -16
- data/test/hpricot/files/uswebgen.html +0 -220
- data/test/hpricot/files/utf8.html +0 -1054
- data/test/hpricot/files/week9.html +0 -1723
- data/test/hpricot/files/why.xml +0 -19
- data/test/hpricot/load_files.rb +0 -7
- data/test/hpricot/test_alter.rb +0 -67
- data/test/hpricot/test_builder.rb +0 -27
- data/test/hpricot/test_parser.rb +0 -423
- data/test/hpricot/test_paths.rb +0 -15
- data/test/hpricot/test_preserved.rb +0 -78
- data/test/hpricot/test_xml.rb +0 -30
- data/test/test_reader.rb +0 -222
data/README.md
ADDED
@@ -0,0 +1,166 @@
|
|
1
|
+
# Nokogiri
|
2
|
+
|
3
|
+
* http://nokogiri.org
|
4
|
+
* Installation: http://nokogiri.org/tutorials/installing_nokogiri.html
|
5
|
+
* Tutorials: http://nokogiri.org
|
6
|
+
* README: https://github.com/sparklemotion/nokogiri
|
7
|
+
* Mailing List: https://groups.google.com/group/nokogiri-talk
|
8
|
+
* Bug Reports: https://github.com/sparklemotion/nokogiri/issues
|
9
|
+
|
10
|
+
|
11
|
+
## Status
|
12
|
+
|
13
|
+
[![Travis Build Status](https://travis-ci.org/sparklemotion/nokogiri.svg?branch=master)](https://travis-ci.org/sparklemotion/nokogiri)
|
14
|
+
[![Appveyor Build Status](https://ci.appveyor.com/api/projects/status/github/sparklemotion/nokogiri?branch=master&svg=true)](https://ci.appveyor.com/project/flavorjones/nokogiri?branch=master)
|
15
|
+
[![Code Climate](https://codeclimate.com/github/sparklemotion/nokogiri.png)](https://codeclimate.com/github/sparklemotion/nokogiri)
|
16
|
+
[![Version Eye](https://www.versioneye.com/ruby/nokogiri/badge.png)](https://www.versioneye.com/ruby/nokogiri)
|
17
|
+
|
18
|
+
|
19
|
+
## Description
|
20
|
+
|
21
|
+
Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser. Among
|
22
|
+
Nokogiri's many features is the ability to search documents via XPath
|
23
|
+
or CSS3 selectors.
|
24
|
+
|
25
|
+
|
26
|
+
## Features
|
27
|
+
|
28
|
+
* XML/HTML DOM parser which handles broken HTML
|
29
|
+
* XML/HTML SAX parser
|
30
|
+
* XML/HTML Push parser
|
31
|
+
* XPath 1.0 support for document searching
|
32
|
+
* CSS3 selector support for document searching
|
33
|
+
* XML/HTML builder
|
34
|
+
* XSLT transformer
|
35
|
+
|
36
|
+
Nokogiri parses and searches XML/HTML using native libraries (either C
|
37
|
+
or Java, depending on your Ruby), which means it's fast and
|
38
|
+
standards-compliant.
|
39
|
+
|
40
|
+
|
41
|
+
## Installation
|
42
|
+
|
43
|
+
If this doesn't work:
|
44
|
+
|
45
|
+
```
|
46
|
+
gem install nokogiri
|
47
|
+
```
|
48
|
+
|
49
|
+
then please start troubleshooting here:
|
50
|
+
|
51
|
+
> http://www.nokogiri.org/tutorials/installing_nokogiri.html
|
52
|
+
|
53
|
+
There are currently 1,237 Stack Overflow questions about Nokogiri
|
54
|
+
installation. The vast majority of them are out of date and therefore
|
55
|
+
incorrect. __Please do not use Stack Overflow.__
|
56
|
+
|
57
|
+
Instead, [tell us](http://nokogiri.org/tutorials/getting_help.html)
|
58
|
+
when the above instructions don't work for you. This allows us to both
|
59
|
+
help you directly and improve the documentation.
|
60
|
+
|
61
|
+
|
62
|
+
### Binary packages
|
63
|
+
|
64
|
+
Binary packages are available for some distributions.
|
65
|
+
|
66
|
+
* Debian: https://packages.debian.org/sid/ruby-nokogiri
|
67
|
+
* SuSE: https://download.opensuse.org/repositories/devel:/languages:/ruby:/extensions/
|
68
|
+
* Fedora: http://s390.koji.fedoraproject.org/koji/packageinfo?packageID=6756
|
69
|
+
|
70
|
+
|
71
|
+
## Support
|
72
|
+
|
73
|
+
There are open-source tutorials (to which we invite contributions!) here: http://nokogiri.org/tutorials
|
74
|
+
|
75
|
+
* The Nokogiri mailing list is active: https://groups.google.com/group/nokogiri-talk
|
76
|
+
* The Nokogiri bug tracker is here: https://github.com/sparklemotion/nokogiri/issues
|
77
|
+
* Before filing a bug report, please read our submission guidelines: http://nokogiri.org/tutorials/getting_help.html
|
78
|
+
* The IRC channel is #nokogiri on freenode.
|
79
|
+
|
80
|
+
|
81
|
+
## Synopsis
|
82
|
+
|
83
|
+
Nokogiri is a large library, but here is example usage for parsing and examining a document:
|
84
|
+
|
85
|
+
```ruby
|
86
|
+
#! /usr/bin/env ruby
|
87
|
+
|
88
|
+
require 'nokogiri'
|
89
|
+
require 'open-uri'
|
90
|
+
|
91
|
+
# Fetch and parse HTML document
|
92
|
+
doc = Nokogiri::HTML(open('http://www.nokogiri.org/tutorials/installing_nokogiri.html'))
|
93
|
+
|
94
|
+
puts "### Search for nodes by css"
|
95
|
+
doc.css('nav ul.menu li a', 'article h2').each do |link|
|
96
|
+
puts link.content
|
97
|
+
end
|
98
|
+
|
99
|
+
puts "### Search for nodes by xpath"
|
100
|
+
doc.xpath('//nav//ul//li/a', '//article//h2').each do |link|
|
101
|
+
puts link.content
|
102
|
+
end
|
103
|
+
|
104
|
+
puts "### Or mix and match."
|
105
|
+
doc.search('nav ul.menu li a', '//article//h2').each do |link|
|
106
|
+
puts link.content
|
107
|
+
end
|
108
|
+
```
|
109
|
+
|
110
|
+
|
111
|
+
## Requirements
|
112
|
+
|
113
|
+
* Ruby 1.9.3 or higher, including any development packages necessary
|
114
|
+
to compile native extensions.
|
115
|
+
|
116
|
+
* In Nokogiri 1.6.0 and later libxml2 and libxslt are bundled with the
|
117
|
+
gem, but if you want to use the system versions:
|
118
|
+
|
119
|
+
* at install time, set the environment variable
|
120
|
+
`NOKOGIRI_USE_SYSTEM_LIBRARIES` or else use the
|
121
|
+
`--use-system-libraries` argument. (See
|
122
|
+
http://nokogiri.org/tutorials/installing_nokogiri.html#using_your_system_libraries
|
123
|
+
for specifics.)
|
124
|
+
|
125
|
+
* libxml2 >=2.6.21 with iconv support
|
126
|
+
(libxml2-dev/-devel is also required)
|
127
|
+
|
128
|
+
* libxslt, built with and supported by the given libxml2
|
129
|
+
(libxslt-dev/-devel is also required)
|
130
|
+
|
131
|
+
|
132
|
+
## Encoding
|
133
|
+
|
134
|
+
Strings are always stored as UTF-8 internally. Methods that return
|
135
|
+
text values will always return UTF-8 encoded strings. Methods that
|
136
|
+
return a string containing markup (like `to_xml`, `to_html` and
|
137
|
+
`inner_html`) will return a string encoded like the source document.
|
138
|
+
|
139
|
+
__WARNING__
|
140
|
+
|
141
|
+
Some documents declare one encoding, but actually use a different
|
142
|
+
one. In these cases, which encoding should the parser choose?
|
143
|
+
|
144
|
+
Data is just a stream of bytes. Humans add meaning to that stream. Any
|
145
|
+
particular set of bytes could be valid characters in multiple
|
146
|
+
encodings, so detecting encoding with 100% accuracy is not
|
147
|
+
possible. `libxml2` does its best, but it can't be right all the time.
|
148
|
+
|
149
|
+
If you want Nokogiri to handle the document encoding properly, your
|
150
|
+
best bet is to explicitly set the encoding. Here is an example of
|
151
|
+
explicitly setting the encoding to EUC-JP on the parser:
|
152
|
+
|
153
|
+
```ruby
|
154
|
+
doc = Nokogiri.XML('<foo><bar /><foo>', nil, 'EUC-JP')
|
155
|
+
```
|
156
|
+
|
157
|
+
## Development
|
158
|
+
|
159
|
+
```bash
|
160
|
+
bundle install
|
161
|
+
bundle exec rake
|
162
|
+
```
|
163
|
+
|
164
|
+
## License
|
165
|
+
|
166
|
+
MIT. See the `LICENSE.txt` file.
|
data/ROADMAP.md
ADDED
@@ -0,0 +1,111 @@
|
|
1
|
+
# Roadmap for API Changes
|
2
|
+
|
3
|
+
## overhaul serialize/pretty printing API
|
4
|
+
|
5
|
+
* https://github.com/sparklemotion/nokogiri/issues/530
|
6
|
+
XHTML formatting can't be turned off
|
7
|
+
|
8
|
+
* https://github.com/sparklemotion/nokogiri/issues/415
|
9
|
+
XML formatting should be no formatting
|
10
|
+
|
11
|
+
|
12
|
+
## overhaul and optimize the SAX parsing
|
13
|
+
|
14
|
+
* see fairy wing throwdown - SAX parsing is wicked slow.
|
15
|
+
|
16
|
+
|
17
|
+
## Node should not be Enumerable; and should have a better attributes API
|
18
|
+
|
19
|
+
* https://github.com/sparklemotion/nokogiri/issues/679
|
20
|
+
Mixing in Enumerable has some unintended consequences; plus we want to improve the attributes API
|
21
|
+
|
22
|
+
* Some ideas for a better attributes API?
|
23
|
+
* (closed) https://github.com/sparklemotion/nokogiri/issues/666
|
24
|
+
* https://github.com/sparklemotion/nokogiri/issues/765
|
25
|
+
|
26
|
+
|
27
|
+
## improve CSS query parsing
|
28
|
+
|
29
|
+
* https://github.com/sparklemotion/nokogiri/issues/528
|
30
|
+
support `:not()` with a nontrivial argument, like `:not(div p.c)`
|
31
|
+
|
32
|
+
* https://github.com/sparklemotion/nokogiri/issues/451
|
33
|
+
chained :not pseudoselectors
|
34
|
+
|
35
|
+
* better jQuery selector and CSS pseudo-selector support:
|
36
|
+
* https://github.com/sparklemotion/nokogiri/issues/621
|
37
|
+
* https://github.com/sparklemotion/nokogiri/issues/342
|
38
|
+
* https://github.com/sparklemotion/nokogiri/issues/628
|
39
|
+
* https://github.com/sparklemotion/nokogiri/issues/652
|
40
|
+
* https://github.com/sparklemotion/nokogiri/issues/688
|
41
|
+
|
42
|
+
* https://github.com/sparklemotion/nokogiri/issues/394
|
43
|
+
nth-of-type is wrong, and possibly other selectors as well
|
44
|
+
|
45
|
+
* https://github.com/sparklemotion/nokogiri/issues/309
|
46
|
+
incorrect query being executed
|
47
|
+
|
48
|
+
* https://github.com/sparklemotion/nokogiri/issues/350
|
49
|
+
:has is wrong?
|
50
|
+
|
51
|
+
|
52
|
+
## DocumentFragment
|
53
|
+
|
54
|
+
* there are a few tickets about searches not working properly if you
|
55
|
+
use or do not use the context node as part of the search.
|
56
|
+
- https://github.com/sparklemotion/nokogiri/issues/213
|
57
|
+
- https://github.com/sparklemotion/nokogiri/issues/370
|
58
|
+
- https://github.com/sparklemotion/nokogiri/issues/454
|
59
|
+
- https://github.com/sparklemotion/nokogiri/issues/572
|
60
|
+
could we fix this by making DocumentFragment be a subclass of NodeSet?
|
61
|
+
|
62
|
+
|
63
|
+
## Better Syntax for custom XPath function handler
|
64
|
+
|
65
|
+
* https://github.com/sparklemotion/nokogiri/pull/464
|
66
|
+
|
67
|
+
|
68
|
+
## Better Syntax around Node#xpath and NodeSet#xpath
|
69
|
+
|
70
|
+
* look at those methods, and use of Node#extract_params in Node#{css,search}
|
71
|
+
* we should standardize on a hash of options for these and other calls
|
72
|
+
* what should NodeSet#xpath return?
|
73
|
+
* https://github.com/sparklemotion/nokogiri/issues/656
|
74
|
+
|
75
|
+
## Encoding
|
76
|
+
|
77
|
+
We have a lot of issues open around encoding. How bad are things?
|
78
|
+
Somebody who knows encoding well should head this up.
|
79
|
+
|
80
|
+
* Extract EncodingReader as a real object that can be injected
|
81
|
+
https://groups.google.com/forum/#!msg/nokogiri-talk/arJeAtMqvkg/tGihB-iBRSAJ
|
82
|
+
|
83
|
+
|
84
|
+
## Reader
|
85
|
+
|
86
|
+
It's fundamentally broken, in that we can't stop people from crashing
|
87
|
+
their application if they want to use object reference unsafely.
|
88
|
+
|
89
|
+
|
90
|
+
## Class methods that require Document
|
91
|
+
|
92
|
+
There are a few methods, like `Nokogiri::XML::Comment.new` that
|
93
|
+
require a Document object.
|
94
|
+
|
95
|
+
We should probably make Document instance methods to wrap this, since
|
96
|
+
it's a non-obvious expectation and thus fails as a convention.
|
97
|
+
|
98
|
+
So, instead, let's make alternative methods like
|
99
|
+
`Nokogiri::XML::Document#new_comment`, and recommend those as the
|
100
|
+
proper convention.
|
101
|
+
|
102
|
+
|
103
|
+
## `collect_namespaces` is just broken
|
104
|
+
|
105
|
+
`collect_namespaces` is returning a hash, which means it can't return
|
106
|
+
namespaces with the same prefix. See this issue for background:
|
107
|
+
|
108
|
+
> https://github.com/sparklemotion/nokogiri/issues/885
|
109
|
+
|
110
|
+
Do we care? This seems like a useless method, but then again I hate
|
111
|
+
XML, so what do I know?
|