nokogiri 1.0.0 → 1.6.8.1

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of nokogiri might be problematic. Click here for more details.

Files changed (309) hide show
  1. checksums.yaml +7 -0
  2. data/.autotest +26 -0
  3. data/.cross_rubies +9 -0
  4. data/.editorconfig +17 -0
  5. data/.gemtest +0 -0
  6. data/.travis.yml +51 -0
  7. data/CHANGELOG.rdoc +1160 -0
  8. data/CONTRIBUTING.md +42 -0
  9. data/C_CODING_STYLE.rdoc +33 -0
  10. data/Gemfile +22 -0
  11. data/LICENSE.txt +31 -0
  12. data/Manifest.txt +284 -40
  13. data/README.md +166 -0
  14. data/ROADMAP.md +111 -0
  15. data/Rakefile +310 -199
  16. data/STANDARD_RESPONSES.md +47 -0
  17. data/Y_U_NO_GEMSPEC.md +155 -0
  18. data/appveyor.yml +22 -0
  19. data/bin/nokogiri +118 -0
  20. data/build_all +45 -0
  21. data/dependencies.yml +29 -0
  22. data/ext/nokogiri/depend +358 -0
  23. data/ext/nokogiri/extconf.rb +664 -34
  24. data/ext/nokogiri/html_document.c +120 -33
  25. data/ext/nokogiri/html_document.h +1 -1
  26. data/ext/nokogiri/html_element_description.c +279 -0
  27. data/ext/nokogiri/html_element_description.h +10 -0
  28. data/ext/nokogiri/html_entity_lookup.c +32 -0
  29. data/ext/nokogiri/html_entity_lookup.h +8 -0
  30. data/ext/nokogiri/html_sax_parser_context.c +116 -0
  31. data/ext/nokogiri/html_sax_parser_context.h +11 -0
  32. data/ext/nokogiri/html_sax_push_parser.c +87 -0
  33. data/ext/nokogiri/html_sax_push_parser.h +9 -0
  34. data/ext/nokogiri/nokogiri.c +145 -0
  35. data/ext/nokogiri/nokogiri.h +131 -0
  36. data/ext/nokogiri/xml_attr.c +94 -0
  37. data/ext/nokogiri/xml_attr.h +9 -0
  38. data/ext/nokogiri/xml_attribute_decl.c +70 -0
  39. data/ext/nokogiri/xml_attribute_decl.h +9 -0
  40. data/ext/nokogiri/xml_cdata.c +23 -19
  41. data/ext/nokogiri/xml_cdata.h +1 -1
  42. data/ext/nokogiri/xml_comment.c +69 -0
  43. data/ext/nokogiri/xml_comment.h +9 -0
  44. data/ext/nokogiri/xml_document.c +501 -54
  45. data/ext/nokogiri/xml_document.h +14 -1
  46. data/ext/nokogiri/xml_document_fragment.c +48 -0
  47. data/ext/nokogiri/xml_document_fragment.h +10 -0
  48. data/ext/nokogiri/xml_dtd.c +109 -24
  49. data/ext/nokogiri/xml_dtd.h +3 -1
  50. data/ext/nokogiri/xml_element_content.c +123 -0
  51. data/ext/nokogiri/xml_element_content.h +10 -0
  52. data/ext/nokogiri/xml_element_decl.c +69 -0
  53. data/ext/nokogiri/xml_element_decl.h +9 -0
  54. data/ext/nokogiri/xml_encoding_handler.c +79 -0
  55. data/ext/nokogiri/xml_encoding_handler.h +8 -0
  56. data/ext/nokogiri/xml_entity_decl.c +110 -0
  57. data/ext/nokogiri/xml_entity_decl.h +10 -0
  58. data/ext/nokogiri/xml_entity_reference.c +52 -0
  59. data/ext/nokogiri/xml_entity_reference.h +9 -0
  60. data/ext/nokogiri/xml_io.c +60 -0
  61. data/ext/nokogiri/xml_io.h +11 -0
  62. data/ext/nokogiri/xml_libxml2_hacks.c +112 -0
  63. data/ext/nokogiri/xml_libxml2_hacks.h +12 -0
  64. data/ext/nokogiri/xml_namespace.c +117 -0
  65. data/ext/nokogiri/xml_namespace.h +13 -0
  66. data/ext/nokogiri/xml_node.c +1285 -315
  67. data/ext/nokogiri/xml_node.h +4 -6
  68. data/ext/nokogiri/xml_node_set.c +415 -54
  69. data/ext/nokogiri/xml_node_set.h +6 -2
  70. data/ext/nokogiri/xml_processing_instruction.c +56 -0
  71. data/ext/nokogiri/xml_processing_instruction.h +9 -0
  72. data/ext/nokogiri/xml_reader.c +316 -77
  73. data/ext/nokogiri/xml_reader.h +1 -1
  74. data/ext/nokogiri/xml_relax_ng.c +161 -0
  75. data/ext/nokogiri/xml_relax_ng.h +9 -0
  76. data/ext/nokogiri/xml_sax_parser.c +215 -80
  77. data/ext/nokogiri/xml_sax_parser.h +30 -1
  78. data/ext/nokogiri/xml_sax_parser_context.c +262 -0
  79. data/ext/nokogiri/xml_sax_parser_context.h +10 -0
  80. data/ext/nokogiri/xml_sax_push_parser.c +115 -0
  81. data/ext/nokogiri/xml_sax_push_parser.h +9 -0
  82. data/ext/nokogiri/xml_schema.c +205 -0
  83. data/ext/nokogiri/xml_schema.h +9 -0
  84. data/ext/nokogiri/xml_syntax_error.c +45 -175
  85. data/ext/nokogiri/xml_syntax_error.h +4 -2
  86. data/ext/nokogiri/xml_text.c +37 -14
  87. data/ext/nokogiri/xml_text.h +1 -1
  88. data/ext/nokogiri/xml_xpath_context.c +230 -13
  89. data/ext/nokogiri/xml_xpath_context.h +2 -1
  90. data/ext/nokogiri/xslt_stylesheet.c +196 -34
  91. data/ext/nokogiri/xslt_stylesheet.h +6 -1
  92. data/lib/nokogiri/css/node.rb +18 -61
  93. data/lib/nokogiri/css/parser.rb +725 -17
  94. data/lib/nokogiri/css/parser.y +126 -63
  95. data/lib/nokogiri/css/parser_extras.rb +91 -0
  96. data/lib/nokogiri/css/syntax_error.rb +7 -0
  97. data/lib/nokogiri/css/tokenizer.rb +148 -5
  98. data/lib/nokogiri/css/tokenizer.rex +31 -39
  99. data/lib/nokogiri/css/xpath_visitor.rb +109 -51
  100. data/lib/nokogiri/css.rb +24 -3
  101. data/lib/nokogiri/decorators/slop.rb +42 -0
  102. data/lib/nokogiri/html/builder.rb +27 -1
  103. data/lib/nokogiri/html/document.rb +329 -3
  104. data/lib/nokogiri/html/document_fragment.rb +39 -0
  105. data/lib/nokogiri/html/element_description.rb +23 -0
  106. data/lib/nokogiri/html/element_description_defaults.rb +671 -0
  107. data/lib/nokogiri/html/entity_lookup.rb +13 -0
  108. data/lib/nokogiri/html/sax/parser.rb +35 -4
  109. data/lib/nokogiri/html/sax/parser_context.rb +16 -0
  110. data/lib/nokogiri/html/sax/push_parser.rb +36 -0
  111. data/lib/nokogiri/html.rb +18 -76
  112. data/lib/nokogiri/syntax_error.rb +4 -0
  113. data/lib/nokogiri/version.rb +106 -1
  114. data/lib/nokogiri/xml/attr.rb +14 -0
  115. data/lib/nokogiri/xml/attribute_decl.rb +18 -0
  116. data/lib/nokogiri/xml/builder.rb +395 -31
  117. data/lib/nokogiri/xml/cdata.rb +4 -2
  118. data/lib/nokogiri/xml/character_data.rb +7 -0
  119. data/lib/nokogiri/xml/document.rb +267 -12
  120. data/lib/nokogiri/xml/document_fragment.rb +149 -0
  121. data/lib/nokogiri/xml/dtd.rb +27 -1
  122. data/lib/nokogiri/xml/element_content.rb +36 -0
  123. data/lib/nokogiri/xml/element_decl.rb +13 -0
  124. data/lib/nokogiri/xml/entity_decl.rb +19 -0
  125. data/lib/nokogiri/xml/namespace.rb +13 -0
  126. data/lib/nokogiri/xml/node/save_options.rb +61 -0
  127. data/lib/nokogiri/xml/node.rb +748 -109
  128. data/lib/nokogiri/xml/node_set.rb +200 -72
  129. data/lib/nokogiri/xml/parse_options.rb +120 -0
  130. data/lib/nokogiri/xml/pp/character_data.rb +18 -0
  131. data/lib/nokogiri/xml/pp/node.rb +56 -0
  132. data/lib/nokogiri/xml/pp.rb +2 -0
  133. data/lib/nokogiri/xml/processing_instruction.rb +8 -0
  134. data/lib/nokogiri/xml/reader.rb +102 -4
  135. data/lib/nokogiri/xml/relax_ng.rb +32 -0
  136. data/lib/nokogiri/xml/sax/document.rb +114 -2
  137. data/lib/nokogiri/xml/sax/parser.rb +97 -7
  138. data/lib/nokogiri/xml/sax/parser_context.rb +16 -0
  139. data/lib/nokogiri/xml/sax/push_parser.rb +60 -0
  140. data/lib/nokogiri/xml/sax.rb +2 -7
  141. data/lib/nokogiri/xml/schema.rb +63 -0
  142. data/lib/nokogiri/xml/searchable.rb +221 -0
  143. data/lib/nokogiri/xml/syntax_error.rb +27 -1
  144. data/lib/nokogiri/xml/text.rb +4 -1
  145. data/lib/nokogiri/xml/xpath/syntax_error.rb +11 -0
  146. data/lib/nokogiri/xml/xpath.rb +4 -0
  147. data/lib/nokogiri/xml/xpath_context.rb +3 -1
  148. data/lib/nokogiri/xml.rb +45 -38
  149. data/lib/nokogiri/xslt/stylesheet.rb +19 -0
  150. data/lib/nokogiri/xslt.rb +47 -2
  151. data/lib/nokogiri.rb +117 -24
  152. data/lib/xsd/xmlparser/nokogiri.rb +102 -0
  153. data/patches/sort-patches-by-date +25 -0
  154. data/ports/archives/libxml2-2.9.4.tar.gz +0 -0
  155. data/ports/archives/libxslt-1.1.29.tar.gz +0 -0
  156. data/suppressions/README.txt +1 -0
  157. data/suppressions/nokogiri_ree-1.8.7.358.supp +61 -0
  158. data/suppressions/nokogiri_ruby-1.8.7.370.supp +0 -0
  159. data/suppressions/nokogiri_ruby-1.9.2.320.supp +28 -0
  160. data/suppressions/nokogiri_ruby-1.9.3.327.supp +28 -0
  161. data/tasks/test.rb +100 -0
  162. data/test/css/test_nthiness.rb +73 -6
  163. data/test/css/test_parser.rb +184 -39
  164. data/test/css/test_tokenizer.rb +72 -19
  165. data/test/css/test_xpath_visitor.rb +44 -2
  166. data/test/decorators/test_slop.rb +20 -0
  167. data/test/files/2ch.html +108 -0
  168. data/test/files/GH_1042.html +18 -0
  169. data/test/files/address_book.rlx +12 -0
  170. data/test/files/address_book.xml +10 -0
  171. data/test/files/atom.xml +344 -0
  172. data/test/files/bar/bar.xsd +4 -0
  173. data/test/files/bogus.xml +0 -0
  174. data/test/files/dont_hurt_em_why.xml +422 -0
  175. data/test/files/encoding.html +82 -0
  176. data/test/files/encoding.xhtml +84 -0
  177. data/test/files/exslt.xml +8 -0
  178. data/test/files/exslt.xslt +35 -0
  179. data/test/files/foo/foo.xsd +4 -0
  180. data/test/files/metacharset.html +10 -0
  181. data/test/files/namespace_pressure_test.xml +1684 -0
  182. data/test/files/noencoding.html +47 -0
  183. data/test/files/po.xml +32 -0
  184. data/test/files/po.xsd +66 -0
  185. data/test/files/saml/saml20assertion_schema.xsd +283 -0
  186. data/test/files/saml/saml20protocol_schema.xsd +302 -0
  187. data/test/files/saml/xenc_schema.xsd +146 -0
  188. data/test/files/saml/xmldsig_schema.xsd +318 -0
  189. data/test/files/shift_jis.html +10 -0
  190. data/test/files/shift_jis.xml +5 -0
  191. data/test/files/shift_jis_no_charset.html +9 -0
  192. data/test/files/slow-xpath.xml +25509 -0
  193. data/test/files/snuggles.xml +3 -0
  194. data/test/files/staff.dtd +10 -0
  195. data/test/files/test_document_url/bar.xml +2 -0
  196. data/test/files/test_document_url/document.dtd +4 -0
  197. data/test/files/test_document_url/document.xml +6 -0
  198. data/test/files/tlm.html +2 -1
  199. data/test/files/to_be_xincluded.xml +2 -0
  200. data/test/files/valid_bar.xml +2 -0
  201. data/test/files/xinclude.xml +4 -0
  202. data/test/helper.rb +124 -13
  203. data/test/html/sax/test_parser.rb +118 -4
  204. data/test/html/sax/test_parser_context.rb +46 -0
  205. data/test/html/sax/test_push_parser.rb +87 -0
  206. data/test/html/test_builder.rb +94 -8
  207. data/test/html/test_document.rb +626 -11
  208. data/test/html/test_document_encoding.rb +145 -0
  209. data/test/html/test_document_fragment.rb +301 -0
  210. data/test/html/test_element_description.rb +105 -0
  211. data/test/html/test_named_characters.rb +14 -0
  212. data/test/html/test_node.rb +212 -0
  213. data/test/html/test_node_encoding.rb +85 -0
  214. data/test/namespaces/test_additional_namespaces_in_builder_doc.rb +14 -0
  215. data/test/namespaces/test_namespaces_aliased_default.rb +24 -0
  216. data/test/namespaces/test_namespaces_in_builder_doc.rb +75 -0
  217. data/test/namespaces/test_namespaces_in_cloned_doc.rb +31 -0
  218. data/test/namespaces/test_namespaces_in_created_doc.rb +75 -0
  219. data/test/namespaces/test_namespaces_in_parsed_doc.rb +80 -0
  220. data/test/namespaces/test_namespaces_preservation.rb +31 -0
  221. data/test/test_convert_xpath.rb +2 -47
  222. data/test/test_css_cache.rb +45 -0
  223. data/test/test_encoding_handler.rb +48 -0
  224. data/test/test_memory_leak.rb +156 -0
  225. data/test/test_nokogiri.rb +103 -1
  226. data/test/test_soap4r_sax.rb +52 -0
  227. data/test/test_xslt_transforms.rb +293 -8
  228. data/test/xml/node/test_save_options.rb +28 -0
  229. data/test/xml/node/test_subclass.rb +44 -0
  230. data/test/xml/sax/test_parser.rb +309 -8
  231. data/test/xml/sax/test_parser_context.rb +115 -0
  232. data/test/xml/sax/test_push_parser.rb +157 -0
  233. data/test/xml/test_attr.rb +67 -0
  234. data/test/xml/test_attribute_decl.rb +86 -0
  235. data/test/xml/test_builder.rb +327 -2
  236. data/test/xml/test_c14n.rb +180 -0
  237. data/test/xml/test_cdata.rb +32 -2
  238. data/test/xml/test_comment.rb +40 -0
  239. data/test/xml/test_document.rb +846 -35
  240. data/test/xml/test_document_encoding.rb +31 -0
  241. data/test/xml/test_document_fragment.rb +271 -0
  242. data/test/xml/test_dtd.rb +153 -9
  243. data/test/xml/test_dtd_encoding.rb +31 -0
  244. data/test/xml/test_element_content.rb +56 -0
  245. data/test/xml/test_element_decl.rb +73 -0
  246. data/test/xml/test_entity_decl.rb +122 -0
  247. data/test/xml/test_entity_reference.rb +251 -0
  248. data/test/xml/test_namespace.rb +96 -0
  249. data/test/xml/test_node.rb +1126 -105
  250. data/test/xml/test_node_attributes.rb +115 -0
  251. data/test/xml/test_node_encoding.rb +69 -0
  252. data/test/xml/test_node_inheritance.rb +32 -0
  253. data/test/xml/test_node_reparenting.rb +549 -0
  254. data/test/xml/test_node_set.rb +668 -9
  255. data/test/xml/test_parse_options.rb +64 -0
  256. data/test/xml/test_processing_instruction.rb +30 -0
  257. data/test/xml/test_reader.rb +589 -0
  258. data/test/xml/test_reader_encoding.rb +134 -0
  259. data/test/xml/test_relax_ng.rb +60 -0
  260. data/test/xml/test_schema.rb +142 -0
  261. data/test/xml/test_syntax_error.rb +30 -0
  262. data/test/xml/test_text.rb +49 -2
  263. data/test/xml/test_unparented_node.rb +440 -0
  264. data/test/xml/test_xinclude.rb +83 -0
  265. data/test/xml/test_xpath.rb +445 -0
  266. data/test/xslt/test_custom_functions.rb +133 -0
  267. data/test/xslt/test_exception_handling.rb +37 -0
  268. data/test_all +107 -0
  269. metadata +459 -115
  270. data/History.txt +0 -6
  271. data/README.ja.txt +0 -86
  272. data/README.txt +0 -87
  273. data/ext/nokogiri/html_sax_parser.c +0 -32
  274. data/ext/nokogiri/html_sax_parser.h +0 -11
  275. data/ext/nokogiri/native.c +0 -40
  276. data/ext/nokogiri/native.h +0 -51
  277. data/ext/nokogiri/xml_xpath.c +0 -46
  278. data/ext/nokogiri/xml_xpath.h +0 -11
  279. data/lib/nokogiri/css/generated_parser.rb +0 -653
  280. data/lib/nokogiri/css/generated_tokenizer.rb +0 -159
  281. data/lib/nokogiri/decorators/hpricot/node.rb +0 -58
  282. data/lib/nokogiri/decorators/hpricot/node_set.rb +0 -14
  283. data/lib/nokogiri/decorators/hpricot/xpath_visitor.rb +0 -17
  284. data/lib/nokogiri/decorators/hpricot.rb +0 -3
  285. data/lib/nokogiri/decorators.rb +0 -1
  286. data/lib/nokogiri/hpricot.rb +0 -47
  287. data/lib/nokogiri/xml/after_handler.rb +0 -18
  288. data/lib/nokogiri/xml/before_handler.rb +0 -32
  289. data/lib/nokogiri/xml/element.rb +0 -6
  290. data/lib/nokogiri/xml/entity_declaration.rb +0 -9
  291. data/nokogiri.gemspec +0 -34
  292. data/test/hpricot/files/basic.xhtml +0 -17
  293. data/test/hpricot/files/boingboing.html +0 -2266
  294. data/test/hpricot/files/cy0.html +0 -3653
  295. data/test/hpricot/files/immob.html +0 -400
  296. data/test/hpricot/files/pace_application.html +0 -1320
  297. data/test/hpricot/files/tenderlove.html +0 -16
  298. data/test/hpricot/files/uswebgen.html +0 -220
  299. data/test/hpricot/files/utf8.html +0 -1054
  300. data/test/hpricot/files/week9.html +0 -1723
  301. data/test/hpricot/files/why.xml +0 -19
  302. data/test/hpricot/load_files.rb +0 -7
  303. data/test/hpricot/test_alter.rb +0 -67
  304. data/test/hpricot/test_builder.rb +0 -27
  305. data/test/hpricot/test_parser.rb +0 -423
  306. data/test/hpricot/test_paths.rb +0 -15
  307. data/test/hpricot/test_preserved.rb +0 -78
  308. data/test/hpricot/test_xml.rb +0 -30
  309. data/test/test_reader.rb +0 -222
data/README.md ADDED
@@ -0,0 +1,166 @@
1
+ # Nokogiri
2
+
3
+ * http://nokogiri.org
4
+ * Installation: http://nokogiri.org/tutorials/installing_nokogiri.html
5
+ * Tutorials: http://nokogiri.org
6
+ * README: https://github.com/sparklemotion/nokogiri
7
+ * Mailing List: https://groups.google.com/group/nokogiri-talk
8
+ * Bug Reports: https://github.com/sparklemotion/nokogiri/issues
9
+
10
+
11
+ ## Status
12
+
13
+ [![Travis Build Status](https://travis-ci.org/sparklemotion/nokogiri.svg?branch=master)](https://travis-ci.org/sparklemotion/nokogiri)
14
+ [![Appveyor Build Status](https://ci.appveyor.com/api/projects/status/github/sparklemotion/nokogiri?branch=master&svg=true)](https://ci.appveyor.com/project/flavorjones/nokogiri?branch=master)
15
+ [![Code Climate](https://codeclimate.com/github/sparklemotion/nokogiri.png)](https://codeclimate.com/github/sparklemotion/nokogiri)
16
+ [![Version Eye](https://www.versioneye.com/ruby/nokogiri/badge.png)](https://www.versioneye.com/ruby/nokogiri)
17
+
18
+
19
+ ## Description
20
+
21
+ Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser. Among
22
+ Nokogiri's many features is the ability to search documents via XPath
23
+ or CSS3 selectors.
24
+
25
+
26
+ ## Features
27
+
28
+ * XML/HTML DOM parser which handles broken HTML
29
+ * XML/HTML SAX parser
30
+ * XML/HTML Push parser
31
+ * XPath 1.0 support for document searching
32
+ * CSS3 selector support for document searching
33
+ * XML/HTML builder
34
+ * XSLT transformer
35
+
36
+ Nokogiri parses and searches XML/HTML using native libraries (either C
37
+ or Java, depending on your Ruby), which means it's fast and
38
+ standards-compliant.
39
+
40
+
41
+ ## Installation
42
+
43
+ If this doesn't work:
44
+
45
+ ```
46
+ gem install nokogiri
47
+ ```
48
+
49
+ then please start troubleshooting here:
50
+
51
+ > http://www.nokogiri.org/tutorials/installing_nokogiri.html
52
+
53
+ There are currently 1,237 Stack Overflow questions about Nokogiri
54
+ installation. The vast majority of them are out of date and therefore
55
+ incorrect. __Please do not use Stack Overflow.__
56
+
57
+ Instead, [tell us](http://nokogiri.org/tutorials/getting_help.html)
58
+ when the above instructions don't work for you. This allows us to both
59
+ help you directly and improve the documentation.
60
+
61
+
62
+ ### Binary packages
63
+
64
+ Binary packages are available for some distributions.
65
+
66
+ * Debian: https://packages.debian.org/sid/ruby-nokogiri
67
+ * SuSE: https://download.opensuse.org/repositories/devel:/languages:/ruby:/extensions/
68
+ * Fedora: http://s390.koji.fedoraproject.org/koji/packageinfo?packageID=6756
69
+
70
+
71
+ ## Support
72
+
73
+ There are open-source tutorials (to which we invite contributions!) here: http://nokogiri.org/tutorials
74
+
75
+ * The Nokogiri mailing list is active: https://groups.google.com/group/nokogiri-talk
76
+ * The Nokogiri bug tracker is here: https://github.com/sparklemotion/nokogiri/issues
77
+ * Before filing a bug report, please read our submission guidelines: http://nokogiri.org/tutorials/getting_help.html
78
+ * The IRC channel is #nokogiri on freenode.
79
+
80
+
81
+ ## Synopsis
82
+
83
+ Nokogiri is a large library, but here is example usage for parsing and examining a document:
84
+
85
+ ```ruby
86
+ #! /usr/bin/env ruby
87
+
88
+ require 'nokogiri'
89
+ require 'open-uri'
90
+
91
+ # Fetch and parse HTML document
92
+ doc = Nokogiri::HTML(open('http://www.nokogiri.org/tutorials/installing_nokogiri.html'))
93
+
94
+ puts "### Search for nodes by css"
95
+ doc.css('nav ul.menu li a', 'article h2').each do |link|
96
+ puts link.content
97
+ end
98
+
99
+ puts "### Search for nodes by xpath"
100
+ doc.xpath('//nav//ul//li/a', '//article//h2').each do |link|
101
+ puts link.content
102
+ end
103
+
104
+ puts "### Or mix and match."
105
+ doc.search('nav ul.menu li a', '//article//h2').each do |link|
106
+ puts link.content
107
+ end
108
+ ```
109
+
110
+
111
+ ## Requirements
112
+
113
+ * Ruby 1.9.3 or higher, including any development packages necessary
114
+ to compile native extensions.
115
+
116
+ * In Nokogiri 1.6.0 and later libxml2 and libxslt are bundled with the
117
+ gem, but if you want to use the system versions:
118
+
119
+ * at install time, set the environment variable
120
+ `NOKOGIRI_USE_SYSTEM_LIBRARIES` or else use the
121
+ `--use-system-libraries` argument. (See
122
+ http://nokogiri.org/tutorials/installing_nokogiri.html#using_your_system_libraries
123
+ for specifics.)
124
+
125
+ * libxml2 >=2.6.21 with iconv support
126
+ (libxml2-dev/-devel is also required)
127
+
128
+ * libxslt, built with and supported by the given libxml2
129
+ (libxslt-dev/-devel is also required)
130
+
131
+
132
+ ## Encoding
133
+
134
+ Strings are always stored as UTF-8 internally. Methods that return
135
+ text values will always return UTF-8 encoded strings. Methods that
136
+ return a string containing markup (like `to_xml`, `to_html` and
137
+ `inner_html`) will return a string encoded like the source document.
138
+
139
+ __WARNING__
140
+
141
+ Some documents declare one encoding, but actually use a different
142
+ one. In these cases, which encoding should the parser choose?
143
+
144
+ Data is just a stream of bytes. Humans add meaning to that stream. Any
145
+ particular set of bytes could be valid characters in multiple
146
+ encodings, so detecting encoding with 100% accuracy is not
147
+ possible. `libxml2` does its best, but it can't be right all the time.
148
+
149
+ If you want Nokogiri to handle the document encoding properly, your
150
+ best bet is to explicitly set the encoding. Here is an example of
151
+ explicitly setting the encoding to EUC-JP on the parser:
152
+
153
+ ```ruby
154
+ doc = Nokogiri.XML('<foo><bar /><foo>', nil, 'EUC-JP')
155
+ ```
156
+
157
+ ## Development
158
+
159
+ ```bash
160
+ bundle install
161
+ bundle exec rake
162
+ ```
163
+
164
+ ## License
165
+
166
+ MIT. See the `LICENSE.txt` file.
data/ROADMAP.md ADDED
@@ -0,0 +1,111 @@
1
+ # Roadmap for API Changes
2
+
3
+ ## overhaul serialize/pretty printing API
4
+
5
+ * https://github.com/sparklemotion/nokogiri/issues/530
6
+ XHTML formatting can't be turned off
7
+
8
+ * https://github.com/sparklemotion/nokogiri/issues/415
9
+ XML formatting should be no formatting
10
+
11
+
12
+ ## overhaul and optimize the SAX parsing
13
+
14
+ * see fairy wing throwdown - SAX parsing is wicked slow.
15
+
16
+
17
+ ## Node should not be Enumerable; and should have a better attributes API
18
+
19
+ * https://github.com/sparklemotion/nokogiri/issues/679
20
+ Mixing in Enumerable has some unintended consequences; plus we want to improve the attributes API
21
+
22
+ * Some ideas for a better attributes API?
23
+ * (closed) https://github.com/sparklemotion/nokogiri/issues/666
24
+ * https://github.com/sparklemotion/nokogiri/issues/765
25
+
26
+
27
+ ## improve CSS query parsing
28
+
29
+ * https://github.com/sparklemotion/nokogiri/issues/528
30
+ support `:not()` with a nontrivial argument, like `:not(div p.c)`
31
+
32
+ * https://github.com/sparklemotion/nokogiri/issues/451
33
+ chained :not pseudoselectors
34
+
35
+ * better jQuery selector and CSS pseudo-selector support:
36
+ * https://github.com/sparklemotion/nokogiri/issues/621
37
+ * https://github.com/sparklemotion/nokogiri/issues/342
38
+ * https://github.com/sparklemotion/nokogiri/issues/628
39
+ * https://github.com/sparklemotion/nokogiri/issues/652
40
+ * https://github.com/sparklemotion/nokogiri/issues/688
41
+
42
+ * https://github.com/sparklemotion/nokogiri/issues/394
43
+ nth-of-type is wrong, and possibly other selectors as well
44
+
45
+ * https://github.com/sparklemotion/nokogiri/issues/309
46
+ incorrect query being executed
47
+
48
+ * https://github.com/sparklemotion/nokogiri/issues/350
49
+ :has is wrong?
50
+
51
+
52
+ ## DocumentFragment
53
+
54
+ * there are a few tickets about searches not working properly if you
55
+ use or do not use the context node as part of the search.
56
+ - https://github.com/sparklemotion/nokogiri/issues/213
57
+ - https://github.com/sparklemotion/nokogiri/issues/370
58
+ - https://github.com/sparklemotion/nokogiri/issues/454
59
+ - https://github.com/sparklemotion/nokogiri/issues/572
60
+ could we fix this by making DocumentFragment be a subclass of NodeSet?
61
+
62
+
63
+ ## Better Syntax for custom XPath function handler
64
+
65
+ * https://github.com/sparklemotion/nokogiri/pull/464
66
+
67
+
68
+ ## Better Syntax around Node#xpath and NodeSet#xpath
69
+
70
+ * look at those methods, and use of Node#extract_params in Node#{css,search}
71
+ * we should standardize on a hash of options for these and other calls
72
+ * what should NodeSet#xpath return?
73
+ * https://github.com/sparklemotion/nokogiri/issues/656
74
+
75
+ ## Encoding
76
+
77
+ We have a lot of issues open around encoding. How bad are things?
78
+ Somebody who knows encoding well should head this up.
79
+
80
+ * Extract EncodingReader as a real object that can be injected
81
+ https://groups.google.com/forum/#!msg/nokogiri-talk/arJeAtMqvkg/tGihB-iBRSAJ
82
+
83
+
84
+ ## Reader
85
+
86
+ It's fundamentally broken, in that we can't stop people from crashing
87
+ their application if they want to use object reference unsafely.
88
+
89
+
90
+ ## Class methods that require Document
91
+
92
+ There are a few methods, like `Nokogiri::XML::Comment.new` that
93
+ require a Document object.
94
+
95
+ We should probably make Document instance methods to wrap this, since
96
+ it's a non-obvious expectation and thus fails as a convention.
97
+
98
+ So, instead, let's make alternative methods like
99
+ `Nokogiri::XML::Document#new_comment`, and recommend those as the
100
+ proper convention.
101
+
102
+
103
+ ## `collect_namespaces` is just broken
104
+
105
+ `collect_namespaces` is returning a hash, which means it can't return
106
+ namespaces with the same prefix. See this issue for background:
107
+
108
+ > https://github.com/sparklemotion/nokogiri/issues/885
109
+
110
+ Do we care? This seems like a useless method, but then again I hate
111
+ XML, so what do I know?