nokogiri 1.5.6.rc2-java → 1.5.6.rc3-java

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of nokogiri might be problematic. Click here for more details.

@@ -2,13 +2,30 @@
2
2
 
3
3
  * Features
4
4
 
5
- * Bugfixes
5
+ * XML::Document#collect_namespaces メソッドのパフォーマンスを改善した。 #761 (ありがとう、Juergen Mangler!)
6
+ * SAX::Document#processing_instructionに新しいcallbackが追加 (ありがとう、Kitaiti Makoto!)
7
+ * Node#native_content= メソッドでエスケープされていない文字列をセットできるようにした。 #768
8
+ * 名前空間を付けて xpath 式を書く場合に、シンボルキーを使えるようにした。#729 (ありがとう、Ben Langfeld.)
9
+ * XML::Node#[]= メソッド内で受け取った引数を文字列に変換するようにした。#729 (ありがとう、Ben Langfeld.)
10
+ * bin/nokogiri コマンドが $stdin からドキュメントを読んで処理できるようにした。
11
+ * bin/nokogiri -e を指定することでコマンドラインプログラムを実行できるようにした。
12
+
6
13
 
7
- * JRuby で '#' で始まる文字列を名前とする EntityReference を作ろうとすると INVALID_CHARACTER_ERR という例外がはっせいする。 #719
8
- * JRuby では Nodeのサブクラスのnamespaceを正しく文字列に変換しない。 #715
9
- * Nokogiri now detects XSLT transform errors. #731 (Thanks, Justin Fitzsimmons!)
10
- * Raise an ArgumentError if an invalid encoding is passed to the SAX parser. #756 (Thanks, Bradley Schaefer!)
11
- * JRuby Node#content now renders newlines properly. #737 (Thanks, Piotr Szmielew!)
14
+ * Bugfixes
15
+ * [JRuby] XML宣言の前にスペースがあると、ドキュメントのパーズに失敗する。(#748の修正でこれもなおっている) #790
16
+ * [JRuby] Nokogiri::XML::Node#content のJRubyの振る舞いがCRubyと同じではない。#794, #797
17
+ * [JRuby] '#' で始まる文字列を名前とする EntityReference を作ろうとすると INVALID_CHARACTER_ERR という例外がはっせいする。 #719
18
+ * [JRuby] では Nodeのサブクラスのnamespaceを正しく文字列に変換しない。 #715
19
+ * Nokogiri はこのバージョンからXSLT変換のエラーを検出するようになった。#731 (ありがとう、Justin Fitzsimmons!)
20
+ * SAXパーザに不正なエンコーディングに渡された場合はArgumentErrorを投げるようにした。#756 (ありがとう、Bradley Schaefer!)
21
+ * [JRuby] Node#contentがこのバージョンから改行コードを正しく表示するようになった。#737 (ありがとう、Piotr Szmielew!)
22
+ * [JRuby] recover optionが指定されている場合は宣言の無いネームスペースを無視するようにした。#748
23
+ * [JRuby] ネームスペースを検出するXPathが続けて実行されても例外を投げてはいけない。#764
24
+ * [JRuby] XMLを表示(出力)する際のホワイトスペースの扱いをlibxml2バージョンとさらに同様になるようにした。#771
25
+ * [JRuby] ネームスペース付きの属性を含むXMLドキュメントを文字列でbuilderに追加しようとすると失敗する。#770
26
+ * [JRuby] Nokogiri::XML::Document#wrapを使って生成したドキュメントに << でノードを追加しようとすると
27
+ undefined method `length' for nil:NilClassのエラーが発生する #781
28
+ * [JRuby] 開いているファイルのデスクリプタを閉じようとすると、"bad file descriptor" が発生する。#495
12
29
 
13
30
 
14
31
  == 1.5.5 / 2012年6月24日
@@ -2,13 +2,29 @@
2
2
 
3
3
  * Features
4
4
 
5
- * Bugfixes
5
+ * Improved performance of XML::Document#collect_namespaces. #761 (Thanks, Juergen Mangler!)
6
+ * New callback SAX::Document#processing_instruction (Thanks, Kitaiti Makoto!)
7
+ * Node#native_content= allows setting unescaped node contant. #768
8
+ * XPath lookup with namespaces supports symbol keys. #729 (Thanks, Ben Langfeld.)
9
+ * XML::Node#[]= stringifies values. #729 (Thanks, Ben Langfeld.)
10
+ * bin/nokogiri will process a document from $stdin
11
+ * bin/nokogiri -e will execute a program from the command line
12
+
6
13
 
7
- * JRuby raises INVALID_CHARACTER_ERR exception when EntityReference name starts with '#'. #719
8
- * JRuby doesn't coerce namespaces out of strings on a direct subclass of Node. #715
14
+ * Bugfixes
15
+ * [JRuby] space prior to xml preamble causes nokogiri to fail parsing. (fixed along with #748) #790
16
+ * [JRuby] Fixed the bug Nokogiri::XML::Node#content inconsistency between Java and C. #794, #797
17
+ * [JRuby] raises INVALID_CHARACTER_ERR exception when EntityReference name starts with '#'. #719
18
+ * [JRuby] doesn't coerce namespaces out of strings on a direct subclass of Node. #715
9
19
  * Nokogiri now detects XSLT transform errors. #731 (Thanks, Justin Fitzsimmons!)
10
20
  * Raise an ArgumentError if an invalid encoding is passed to the SAX parser. #756 (Thanks, Bradley Schaefer!)
11
- * JRuby Node#content now renders newlines properly. #737 (Thanks, Piotr Szmielew!)
21
+ * [JRuby] Node#content now renders newlines properly. #737 (Thanks, Piotr Szmielew!)
22
+ * [JRuby] Unknown namespace are ignore when the recover option is used. #748
23
+ * [JRuby] XPath queries for namespaces should not throw exceptions when called twice in a row. #764
24
+ * [JRuby] More consistent (with libxml2) whitespace formatting when emitting XML. #771
25
+ * [JRuby] namespaced attributes broken when appending raw xml to builder. #770
26
+ * [JRuby] Nokogiri::XML::Document#wrap raises undefined method `length' for nil:NilClass when trying to << to a node. #781
27
+ * [JRuby] Fixed "bad file descriptor" bug when closing open file descriptors. #495
12
28
 
13
29
 
14
30
  == 1.5.5 / 2012-06-24
@@ -122,13 +122,10 @@ Developing Nokogiri requires racc and rexical to generate the parser and
122
122
  tokenizer. To start development, make sure you have `libxml2` and `libxslt`
123
123
  installed.
124
124
 
125
- Then install hoe and rake-compiler:
125
+ Then install core gems and bootstrap:
126
126
 
127
- $ gem install hoe rake-compiler racc rexical minitest
128
-
129
- Then run rake:
130
-
131
- $ rake
127
+ $ gem install hoe rake-compiler mini_portile
128
+ $ rake newb
132
129
 
133
130
  === Developing on JRuby
134
131
 
data/ROADMAP.md CHANGED
@@ -19,8 +19,9 @@
19
19
  * https://github.com/sparklemotion/nokogiri/issues/679
20
20
  Mixing in Enumerable has some unintended consequences; plus we want to improve the attributes API
21
21
 
22
- * (closed) https://github.com/sparklemotion/nokogiri/issues/666
23
- Some ideas for a better attributes API?
22
+ * Some ideas for a better attributes API?
23
+ * (closed) https://github.com/sparklemotion/nokogiri/issues/666
24
+ * https://github.com/sparklemotion/nokogiri/issues/765
24
25
 
25
26
 
26
27
  ## improve CSS query parsing
data/Rakefile CHANGED
@@ -17,6 +17,8 @@ def java?
17
17
  !! (RUBY_PLATFORM =~ /java/)
18
18
  end
19
19
 
20
+ ENV['LANG'] = "en_US.UTF-8" # UBUNTU 10.04, Y U NO DEFAULT TO UTF-8?
21
+
20
22
  require 'tasks/nokogiri.org'
21
23
 
22
24
  HOE = Hoe.spec 'nokogiri' do
@@ -118,9 +120,9 @@ task 'gem:spec' => 'generate' if Rake::Task.task_defined?("gem:spec")
118
120
  # dependencies in the Gemfile are constrainted to ruby platforms
119
121
  # (i.e. MRI and Rubinius). There's no way to do that through hoe,
120
122
  # and any solution will require changing hoe and hoe-bundler.
121
- old_gemfile_task = Rake::Task['bundler:gemfile']
123
+ old_gemfile_task = Rake::Task['bundler:gemfile'] rescue nil
122
124
  task 'bundler:gemfile' do
123
- old_gemfile_task.invoke
125
+ old_gemfile_task.invoke if old_gemfile_task
124
126
 
125
127
  lines = File.open('Gemfile', 'r') { |f| f.readlines }.map do |line|
126
128
  line =~ /racc|rexical/ ? "#{line.strip}, :platform => :ruby" : line
@@ -16,6 +16,7 @@ opts = OptionParser.new do |opts|
16
16
  opts.separator "Examples:"
17
17
  opts.separator " nokogiri http://www.ruby-lang.org/"
18
18
  opts.separator " nokogiri ./public/index.html"
19
+ opts.separator " curl -s http://nokogiri.org | nokogiri -e'p $_.css(\"h1\").length'"
19
20
  opts.separator ""
20
21
  opts.separator "Options:"
21
22
 
@@ -27,6 +28,10 @@ opts = OptionParser.new do |opts|
27
28
  encoding = v
28
29
  end
29
30
 
31
+ opts.on("-e command", "Specifies script from command-line.") do |v|
32
+ @script = v
33
+ end
34
+
30
35
  opts.on("--rng <uri|path>", "Validate using this rng file.") do |v|
31
36
  @rng = open(v) {|f| Nokogiri::XML::RelaxNG(f)}
32
37
  end
@@ -45,19 +50,29 @@ opts.parse!
45
50
 
46
51
  uri = ARGV.shift
47
52
 
48
- if uri.to_s.strip.empty?
53
+ if uri.to_s.strip.empty? && $stdin.tty?
49
54
  puts opts
50
55
  exit 1
51
56
  end
52
57
 
53
- @doc = parse_class.parse(open(uri).read, nil, encoding)
58
+ if $stdin.tty?
59
+ @doc = parse_class.parse(open(uri).read, nil, encoding)
60
+ else
61
+ @doc = parse_class.parse($stdin, nil, encoding)
62
+ end
63
+
64
+ $_ = @doc
54
65
 
55
66
  if @rng
56
67
  @rng.validate(@doc).each do |error|
57
68
  puts error.message
58
69
  end
59
70
  else
60
- puts "Your document is stored in @doc..."
61
- IRB.start
71
+ if @script
72
+ eval @script, binding, '<main>'
73
+ else
74
+ puts "Your document is stored in @doc..."
75
+ IRB.start
76
+ end
62
77
  end
63
78
 
data/build_all CHANGED
@@ -13,6 +13,10 @@
13
13
  #
14
14
  # as you build, you may run into these problems:
15
15
  #
16
+ # - if you're using Virtualbox shared directories, you'll get a mingw
17
+ # "Protocol error" at linktime. Boo! Either use NFS or a
18
+ # locally-checked-out repository.
19
+ #
16
20
  # - on ubuntus 11 and later, you may have issues with building
17
21
  # rake-compiler's rubies against openssl v2. Just comment the lines
18
22
  # out from ossl_ssl.c and you'll be fine.
@@ -43,7 +47,7 @@ fi
43
47
 
44
48
  function rvm_use {
45
49
  current_ruby=$1
46
- rvm use "${1}@nokogiri" --create
50
+ rvm use "${1}@nokogiri" --create || rvm -v
47
51
  }
48
52
 
49
53
  set -o errexit
@@ -535,6 +535,7 @@ public class XmlDocument extends XmlNode {
535
535
  @JRubyMethod(meta=true)
536
536
  public static IRubyObject wrapJavaDocument(ThreadContext context, IRubyObject klazz, IRubyObject arg) {
537
537
  XmlDocument xmlDocument = (XmlDocument) NokogiriService.XML_DOCUMENT_ALLOCATOR.allocate(context.getRuntime(), getNokogiriClass(context.getRuntime(), "Nokogiri::XML::Document"));
538
+ RuntimeHelpers.invoke(context, xmlDocument, "initialize");
538
539
  Document document = (Document)arg.toJava(Document.class);
539
540
  xmlDocument.setDocumentNode(context, document);
540
541
  return xmlDocument;
@@ -98,7 +98,6 @@ public class XmlNamespace extends RubyObject {
98
98
  this.href = href;
99
99
  this.prefixString = prefixString;
100
100
  this.hrefString = hrefString;
101
- this.attr.setUserData(CACHED_NODE, this, null);
102
101
  setInstanceVariable("@document", xmlDocument);
103
102
  }
104
103
 
@@ -45,6 +45,7 @@ import java.io.InputStream;
45
45
  import java.nio.charset.CharacterCodingException;
46
46
  import java.nio.charset.Charset;
47
47
  import java.util.ArrayList;
48
+ import java.util.Iterator;
48
49
  import java.util.List;
49
50
 
50
51
  import nokogiri.internals.HtmlDomParserContext;
@@ -808,23 +809,46 @@ public class XmlNode extends RubyObject {
808
809
 
809
810
  @JRubyMethod(name = {"content", "text", "inner_text"})
810
811
  public IRubyObject content(ThreadContext context) {
811
- if (content != null && content.isNil()) return content;
812
+ if (!node.hasChildNodes() && node.getNodeValue() == null &&
813
+ (node.getNodeType() == Node.TEXT_NODE || node.getNodeType() == Node.CDATA_SECTION_NODE))
814
+ return context.nil;
812
815
  String textContent;
813
- if (content != null) textContent = rubyStringToString(content);
814
- else if (this instanceof XmlDocument) {
816
+ if (this instanceof XmlDocument) {
815
817
  Node node = ((Document)this.node).getDocumentElement();
816
818
  if (node == null) {
817
819
  textContent = "";
818
820
  } else {
819
- textContent = ((Document)this.node).getDocumentElement().getTextContent();
821
+ Node documentElement = ((Document)this.node).getDocumentElement();
822
+ StringBuffer buffer = new StringBuffer();
823
+ getTextContentRecursively(context, buffer, documentElement);
824
+ textContent = buffer.toString();
820
825
  }
821
826
  } else {
822
- textContent = this.node.getTextContent();
827
+ StringBuffer buffer = new StringBuffer();
828
+ getTextContentRecursively(context, buffer, node);
829
+ textContent = buffer.toString();
823
830
  }
824
- textContent = NokogiriHelpers.convertEncodingByNKFIfNecessary(context.getRuntime(), (XmlDocument)document(context), textContent);
825
- String decodedText = null;
826
- if (textContent != null) decodedText = NokogiriHelpers.decodeJavaString(textContent);
827
- return stringOrNil(context.getRuntime(), decodedText);
831
+ NokogiriHelpers.convertEncodingByNKFIfNecessary(context.getRuntime(), (XmlDocument)document(context), textContent);
832
+ return stringOrNil(context.getRuntime(), textContent);
833
+ }
834
+
835
+ private void getTextContentRecursively(ThreadContext context, StringBuffer buffer, Node currentNode) {
836
+ String textContent = currentNode.getNodeValue();
837
+ if (textContent != null && NokogiriHelpers.shouldDecode(currentNode))
838
+ textContent = NokogiriHelpers.decodeJavaString(textContent);
839
+ if (textContent != null)
840
+ buffer.append(textContent);
841
+ NodeList children = currentNode.getChildNodes();
842
+ for (int i = 0; i < children.getLength(); i++) {
843
+ Node child = children.item(i);
844
+ if (hasTextContent(child))
845
+ getTextContentRecursively(context, buffer, child);
846
+ }
847
+ }
848
+
849
+ private boolean hasTextContent(Node child) {
850
+ return child.getNodeType() != Node.COMMENT_NODE &&
851
+ child.getNodeType() != Node.PROCESSING_INSTRUCTION_NODE;
828
852
  }
829
853
 
830
854
  @JRubyMethod
@@ -892,9 +916,7 @@ public class XmlNode extends RubyObject {
892
916
  String key = rubyStringToString(rbkey);
893
917
  Element element = (Element) node;
894
918
  String value = element.getAttribute(key);
895
- if (value != null) {
896
- return context.getRuntime().newString(value);
897
- }
919
+ return nonEmptyStringOrNil(context.getRuntime(), value);
898
920
  }
899
921
  return context.getRuntime().getNil();
900
922
  }
@@ -1004,7 +1026,7 @@ public class XmlNode extends RubyObject {
1004
1026
  NokogiriNamespaceCache nsCache = xmlDocument.getNamespaceCache();
1005
1027
  String prefix = node.getPrefix();
1006
1028
  XmlNamespace namespace = nsCache.get(prefix == null ? "" : prefix, node.getNamespaceURI());
1007
- if (namespace == null || ((XmlNamespace) namespace).isEmpty()) {
1029
+ if (namespace == null || namespace.isEmpty()) {
1008
1030
  return context.getRuntime().getNil();
1009
1031
  }
1010
1032
 
@@ -1027,10 +1049,10 @@ public class XmlNode extends RubyObject {
1027
1049
  if (doc instanceof HtmlDocument) return namespace_definitions;
1028
1050
  List<XmlNamespace> namespaces = ((XmlDocument)doc).getNamespaceCache().get(node);
1029
1051
  for (XmlNamespace namespace : namespaces) {
1030
- ((RubyArray)namespace_definitions).append(namespace);
1052
+ namespace_definitions.append(namespace);
1031
1053
  }
1032
1054
 
1033
- return (RubyArray) namespace_definitions;
1055
+ return namespace_definitions;
1034
1056
  }
1035
1057
 
1036
1058
  /**
@@ -1060,12 +1082,13 @@ public class XmlNode extends RubyObject {
1060
1082
  }
1061
1083
 
1062
1084
  protected void setContent(IRubyObject content) {
1063
- this.content = content;
1064
1085
  String javaContent = rubyStringToString(content);
1065
1086
  node.setTextContent(javaContent);
1066
1087
  if (javaContent.length() == 0) return;
1067
1088
  if (node.getNodeType() == Node.TEXT_NODE || node.getNodeType() == Node.CDATA_SECTION_NODE) return;
1068
- node.getFirstChild().setUserData(NokogiriHelpers.ENCODED_STRING, true, null);
1089
+ if (node.getFirstChild() != null) {
1090
+ node.getFirstChild().setUserData(NokogiriHelpers.ENCODED_STRING, true, null);
1091
+ }
1069
1092
  }
1070
1093
 
1071
1094
  private void setContent(String content) {
@@ -1073,7 +1096,7 @@ public class XmlNode extends RubyObject {
1073
1096
  this.content = null; // clear cache
1074
1097
  }
1075
1098
 
1076
- @JRubyMethod(name = "native_content=", visibility = Visibility.PRIVATE)
1099
+ @JRubyMethod(name = "native_content=")
1077
1100
  public IRubyObject native_content_set(ThreadContext context, IRubyObject content) {
1078
1101
  setContent(content);
1079
1102
  return content;
@@ -1162,13 +1185,42 @@ public class XmlNode extends RubyObject {
1162
1185
  String key = rubyStringToString(rbkey);
1163
1186
  String val = rubyStringToString(rbval);
1164
1187
  Element element = (Element) node;
1165
- element.setAttribute(key, val);
1188
+
1189
+ int colonIndex = key.indexOf(":");
1190
+ if (colonIndex > 0) {
1191
+ String prefix = key.substring(0, colonIndex);
1192
+ String uri = null;
1193
+ if (prefix.equals("xml")) {
1194
+ uri = "http://www.w3.org/XML/1998/namespace";
1195
+ } else {
1196
+ uri = findNamespaceHref(context, prefix);
1197
+ }
1198
+ element.setAttributeNS(uri, key, val);
1199
+ } else {
1200
+ element.setAttribute(key, val);
1201
+ }
1166
1202
  return this;
1167
1203
  } else {
1168
1204
  return rbval;
1169
1205
  }
1170
1206
  }
1171
1207
 
1208
+ private String findNamespaceHref(ThreadContext context, String prefix) {
1209
+ XmlNode currentNode = this;
1210
+ while(currentNode != document(context)) {
1211
+ RubyArray namespaces = (RubyArray) currentNode.namespace_scopes(context);
1212
+ Iterator iterator = namespaces.iterator();
1213
+ while(iterator.hasNext()) {
1214
+ XmlNamespace namespace = (XmlNamespace) iterator.next();
1215
+ if (namespace.getPrefix().equals(prefix)) {
1216
+ return namespace.getHref();
1217
+ }
1218
+ }
1219
+ currentNode = (XmlNode) currentNode.parent(context);
1220
+ }
1221
+ return null;
1222
+ }
1223
+
1172
1224
  @JRubyMethod
1173
1225
  public IRubyObject parent(ThreadContext context) {
1174
1226
  /*
@@ -34,13 +34,11 @@ package nokogiri;
34
34
 
35
35
  import static nokogiri.internals.NokogiriHelpers.getCachedNodeOrCreate;
36
36
  import static nokogiri.internals.NokogiriHelpers.rubyStringToString;
37
- import static nokogiri.internals.NokogiriHelpers.stringOrNil;
38
37
  import nokogiri.internals.SaveContextVisitor;
39
38
 
40
39
  import org.jruby.Ruby;
41
40
  import org.jruby.RubyClass;
42
41
  import org.jruby.anno.JRubyClass;
43
- import org.jruby.anno.JRubyMethod;
44
42
  import org.jruby.runtime.ThreadContext;
45
43
  import org.jruby.runtime.builtin.IRubyObject;
46
44
  import org.w3c.dom.Document;
@@ -88,17 +86,7 @@ public class XmlText extends XmlNode {
88
86
  if (name == null) name = context.getRuntime().newString("text");
89
87
  return name;
90
88
  }
91
-
92
- @Override
93
- @JRubyMethod(name = {"content", "text", "inner_text"})
94
- public IRubyObject content(ThreadContext context) {
95
- if (content == null || content.isNil()) {
96
- return stringOrNil(context.getRuntime(), node.getTextContent());
97
- } else {
98
- return content;
99
- }
100
- }
101
-
89
+
102
90
  @Override
103
91
  public void accept(ThreadContext context, SaveContextVisitor visitor) {
104
92
  visitor.enter((Text)node);
@@ -114,6 +102,6 @@ public class XmlText extends XmlNode {
114
102
  }
115
103
  child = child.getNextSibling();
116
104
  }
117
- visitor.leave((Text)node);
105
+ visitor.leave(node);
118
106
  }
119
107
  }
@@ -73,11 +73,11 @@ public class NokogiriHandler extends DefaultHandler2 implements XmlDeclHandler {
73
73
  * TODO: should these be stored in the document 'errors' array?
74
74
  * Currently only string messages are stored there.
75
75
  */
76
- private LinkedList<XmlSyntaxError> errors = new LinkedList<XmlSyntaxError>();
77
-
76
+ private final LinkedList<XmlSyntaxError> errors = new LinkedList<XmlSyntaxError>();
77
+
78
78
  private Locator locator;
79
- private ArrayDeque<Integer> lines;
80
- private ArrayDeque<Integer> columns;
79
+ private final ArrayDeque<Integer> lines;
80
+ private final ArrayDeque<Integer> columns;
81
81
  private static String htmlParserName = "Nokogiri::HTML::SAX::Parser";
82
82
  private boolean needEmptyAttrCheck = false;
83
83
 
@@ -106,6 +106,7 @@ public class NokogiriHandler extends DefaultHandler2 implements XmlDeclHandler {
106
106
  call("start_document");
107
107
  }
108
108
 
109
+ @Override
109
110
  public void xmlDecl(String version, String encoding, String standalone) {
110
111
  call("xmldecl", stringOrNil(ruby, version),
111
112
  stringOrNil(ruby, encoding),
@@ -117,6 +118,11 @@ public class NokogiriHandler extends DefaultHandler2 implements XmlDeclHandler {
117
118
  call("end_document");
118
119
  }
119
120
 
121
+ @Override
122
+ public void processingInstruction(String target, String data) {
123
+ call("processing_instruction", ruby.newString(target), ruby.newString(data));
124
+ }
125
+
120
126
  /*
121
127
  * This has to call either "start_element" or
122
128
  * "start_element_namespace" depending on whether there are any