libxml-ruby 0.9.3 → 0.9.4
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGES +9 -0
- data/README +61 -128
- data/doc/css/normal.css +182 -0
- data/doc/img/raze-tiny.png +0 -0
- data/doc/img/red-cube.jpg +0 -0
- data/doc/img/xml-ruby.png +0 -0
- data/doc/index.xml +43 -0
- data/doc/install.xml +77 -0
- data/doc/layout.rhtml +38 -0
- data/doc/layout.xsl +67 -0
- data/doc/license.xml +32 -0
- data/doc/log/changelog.xml +1324 -0
- data/doc/log/changelog.xsl +42 -0
- data/ext/libxml/ruby_xml_document.c +1084 -1057
- data/ext/libxml/ruby_xml_html_parser.c +37 -40
- data/ext/libxml/ruby_xml_input.c +17 -40
- data/ext/libxml/ruby_xml_input.h +2 -2
- data/ext/libxml/ruby_xml_parser.c +151 -151
- data/ext/libxml/ruby_xml_reader.c +910 -893
- data/ext/libxml/ruby_xml_sax_parser.c +174 -174
- data/ext/libxml/ruby_xml_sax_parser.h +12 -12
- data/ext/libxml/ruby_xml_xpointer.h +13 -25
- data/ext/libxml/version.h +2 -2
- data/ext/vc/libxml_ruby.vcproj +1 -1
- data/test/model/ruby-lang.html +238 -0
- data/test/tc_html_parser.rb +2 -12
- data/test/tc_reader.rb +87 -87
- metadata +17 -3
- data/test/test.rb +0 -8
data/CHANGES
CHANGED
@@ -1,5 +1,14 @@
|
|
1
1
|
(See log/ChangeLog for more detailed changes derived directly from source control.)
|
2
2
|
|
3
|
+
== 0.9.4 / 2008-11-24 Charlie Savage
|
4
|
+
|
5
|
+
* Update HTML parser so that it can read files, strings and io
|
6
|
+
streams.
|
7
|
+
|
8
|
+
* Update HTML parser to support user specified encodings.
|
9
|
+
|
10
|
+
* Additional C code cleanup.
|
11
|
+
|
3
12
|
== 0.9.3 / 2008-11-22 Charlie Savage
|
4
13
|
|
5
14
|
* Fixed segementation fault caused by documents being freed
|
data/README
CHANGED
@@ -4,12 +4,11 @@
|
|
4
4
|
The libxml gem provides Ruby language bindings for GNOME's Libxml2
|
5
5
|
XML toolkit. It is free software, released under the MIT License.
|
6
6
|
|
7
|
-
libxml-ruby
|
8
|
-
|
9
|
-
* Speed - libxml is many times faster than REXML
|
10
|
-
* Features - libxml provides a number of additional features over REXML, including XML Schema Validation, RelaxNg validation, xslt (see libxslt-ruby)
|
11
|
-
* Conformance - libxml passes all 1800+ tests from the OASIS XML Tests Suite
|
7
|
+
We think libxml-ruby is the best XML library for Ruby because:
|
12
8
|
|
9
|
+
* Speed - Its much faster than REXML and Hpricot
|
10
|
+
* Features - It provides an amazing number of featues
|
11
|
+
* Conformance - It passes all 1800+ tests from the OASIS XML Tests Suite
|
13
12
|
|
14
13
|
== Requirements
|
15
14
|
libxml-ruby requires Ruby 1.8.4 or higher. It is dependent on
|
@@ -41,58 +40,56 @@ libxml2 and iconv. The gem also includes a Microsoft VC++ 2008
|
|
41
40
|
solution. If you wish to run a debug version of libxml-ruby on
|
42
41
|
Windows, then it is highly recommended you use VC++.
|
43
42
|
|
44
|
-
==
|
45
|
-
|
46
|
-
from the OASIS XML Tests Suite. It includes rich functionality such as:
|
43
|
+
== Getting Started
|
44
|
+
Using libxml is easy. First decide what parser you want to use:
|
47
45
|
|
48
|
-
*
|
49
|
-
|
50
|
-
*
|
51
|
-
|
52
|
-
*
|
53
|
-
*
|
54
|
-
|
55
|
-
* RelaxNG Schemas (LibXML::XML::RelaxNG)
|
56
|
-
* XML Schema (LibXML::XML::Schema)
|
57
|
-
* XSLT (http://rubyforge.org/projects/libxsl/)
|
46
|
+
* Generally you'll want to use the XML::Parser which provides
|
47
|
+
a tree based API
|
48
|
+
* For large documents that won't fit into memory, or if you
|
49
|
+
prefer an input based API, then use XML::Reader
|
50
|
+
* If you are parsing HTML files, then use XML::HTMLParser
|
51
|
+
* If you are masochistic or old stream, then use the XML::SaxParser
|
52
|
+
which provides a callback API.
|
58
53
|
|
59
|
-
|
60
|
-
|
54
|
+
Once you choose a parser, then choose a datasource and its
|
55
|
+
encoding. Libxml can parse files, strings, URIs and IO stream.
|
56
|
+
For more information, see XML::Input.
|
61
57
|
|
62
|
-
==
|
63
|
-
|
64
|
-
|
65
|
-
of a couple simple benchmarks recently blogged about on the
|
66
|
-
Web (you can find them in the benchmark directory of the
|
67
|
-
libxml distribution).
|
58
|
+
== Advanced Functionality
|
59
|
+
Beyond the basics of parsing and processing XML and HTML documents,
|
60
|
+
lLibxml provides a wealth of additional functionality.
|
68
61
|
|
69
|
-
|
62
|
+
Most commonly, you'll want to use its XML::XPath support, which makes
|
63
|
+
it easy to search for data inside and XML document. Although not as
|
64
|
+
popular, XML::XPointer provides another API for finding data inside
|
65
|
+
an XML document.
|
70
66
|
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
|
67
|
+
Often times you'll need to validate data before processing it. For example,
|
68
|
+
if you accept user generated content submitted over the Web, you'll
|
69
|
+
want to first verify it does not contain malicious code such as embedded scripts.
|
70
|
+
This can be done using libxml's powerful set of validators:
|
75
71
|
|
76
|
-
|
72
|
+
* DTDs (LibXML::XML::Dtd)
|
73
|
+
* Relax Schemas (LibXML::XML::RelaxNG)
|
74
|
+
* XML Schema (LibXML::XML::Schema)
|
77
75
|
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
rexml 22.859000 0.047000 22.906000 ( 23.203000)
|
76
|
+
Finally, if you'd like to use XSL Transformations to process data,
|
77
|
+
then also install the libxslt gem which is available at
|
78
|
+
http://rubyforge.org/projects/libxsl/.
|
82
79
|
|
83
|
-
==
|
80
|
+
== Usage
|
84
81
|
For in-depth information about using libxml-ruby please refer
|
85
|
-
to its online Rdoc documentation.
|
82
|
+
to its online Rdoc documentation.
|
86
83
|
|
87
|
-
All libxml classes are in the LibXML::XML module. The
|
88
|
-
|
89
|
-
the LibXML module into the global namespace, allowing you to
|
90
|
-
write code like this:
|
84
|
+
All libxml classes are in the LibXML::XML module. The easiest
|
85
|
+
way to use libxml is to require 'xml'. This will mixin
|
86
|
+
the LibXML module into the global namespace, allowing you to
|
87
|
+
write code like this:
|
91
88
|
|
92
89
|
require 'xml'
|
93
90
|
document = XML::Document.new
|
94
91
|
|
95
|
-
However, when creating an application or library you plan to
|
92
|
+
However, when creating an application or library you plan to
|
96
93
|
redistribute, it is best to not add the LibXML module to the global
|
97
94
|
namespace, in which case you can either write your code like this:
|
98
95
|
|
@@ -114,104 +111,40 @@ and include LibXML into it. For example:
|
|
114
111
|
end
|
115
112
|
end
|
116
113
|
|
117
|
-
For simplicity's sake
|
118
|
-
shown below.
|
114
|
+
For simplicity's sake, the documentation uses the xml module in its examples.
|
119
115
|
|
120
|
-
|
121
|
-
|
116
|
+
== Performance
|
117
|
+
In addition to being feature rich and conformation, the main reason
|
118
|
+
people use libxml-ruby is for performance. Here are the results
|
119
|
+
of a couple simple benchmarks recently blogged about on the
|
120
|
+
Web (you can find them in the benchmark directory of the
|
121
|
+
libxml distribution).
|
122
122
|
|
123
|
-
|
124
|
-
doc = XML::Document.file('output.xml')
|
125
|
-
root = doc.root
|
126
|
-
|
127
|
-
puts "Root element name: #{root.name}"
|
128
|
-
|
129
|
-
elem3 = root.find('elem3').to_a.first
|
130
|
-
puts "Elem3: #{elem3['attr']}"
|
131
|
-
|
132
|
-
doc.find('//root_node/foo/bar').each do |node|
|
133
|
-
puts "Node path: #{node.path} \t Contents: #{node.content}"
|
134
|
-
end
|
123
|
+
From http://depixelate.com/2008/4/23/ruby-xml-parsing-benchmarks
|
135
124
|
|
136
|
-
|
125
|
+
user system total real
|
126
|
+
libxml 0.032000 0.000000 0.032000 ( 0.031000)
|
127
|
+
Hpricot 0.640000 0.031000 0.671000 ( 0.890000)
|
128
|
+
REXML 1.813000 0.047000 1.860000 ( 2.031000)
|
137
129
|
|
138
|
-
|
139
|
-
Elem3: baz
|
140
|
-
Node path: /root_node/foo/bar[1] Contents: 1
|
141
|
-
Node path: /root_node/foo/bar[2] Contents: 2
|
142
|
-
Node path: /root_node/foo/bar[3] Contents: 3
|
143
|
-
Node path: /root_node/foo/bar[4] Contents: 4
|
144
|
-
Node path: /root_node/foo/bar[5] Contents: 5
|
145
|
-
Node path: /root_node/foo/bar[6] Contents: 6
|
146
|
-
Node path: /root_node/foo/bar[7] Contents: 7
|
147
|
-
Node path: /root_node/foo/bar[8] Contents: 8
|
148
|
-
Node path: /root_node/foo/bar[9] Contents: 9
|
149
|
-
Node path: /root_node/foo/bar[10] Contents: 10
|
130
|
+
From https://svn.concord.org/svn/projects/trunk/common/ruby/xml_benchmarks/
|
150
131
|
|
151
|
-
|
152
|
-
|
132
|
+
user system total real
|
133
|
+
libxml 0.641000 0.031000 0.672000 ( 0.672000)
|
134
|
+
hpricot 5.359000 0.062000 5.421000 ( 5.516000)
|
135
|
+
rexml 22.859000 0.047000 22.906000 ( 23.203000)
|
153
136
|
|
154
|
-
require 'xml'
|
155
|
-
|
156
|
-
doc = XML::Document.new()
|
157
|
-
doc.root = XML::Node.new('root_node')
|
158
|
-
root = doc.root
|
159
|
-
|
160
|
-
root << elem1 = XML::Node.new('elem1')
|
161
|
-
elem1['attr1'] = 'val1'
|
162
|
-
elem1['attr2'] = 'val2'
|
163
|
-
|
164
|
-
root << elem2 = XML::Node.new('elem2')
|
165
|
-
elem2['attr1'] = 'val1'
|
166
|
-
elem2['attr2'] = 'val2'
|
167
|
-
|
168
|
-
root << elem3 = XML::Node.new('elem3')
|
169
|
-
elem3 << elem4 = XML::Node.new('elem4')
|
170
|
-
elem3 << elem5 = XML::Node.new('elem5')
|
171
|
-
|
172
|
-
elem5 << elem6 = XML::Node.new('elem6')
|
173
|
-
elem6 << 'Content for element 6'
|
174
|
-
|
175
|
-
elem3['attr'] = 'baz'
|
176
|
-
|
177
|
-
format = true
|
178
|
-
doc.save('output.xml', format)
|
179
|
-
|
180
|
-
The file output.xml contains:
|
181
|
-
|
182
|
-
<?xml version="1.0"?>
|
183
|
-
<root_node>
|
184
|
-
<elem1 attr1="val1" attr2="val2"/>
|
185
|
-
<elem2 attr1="val1" attr2="val2"/>
|
186
|
-
<elem3 attr="baz">
|
187
|
-
<elem4/>
|
188
|
-
<elem5>
|
189
|
-
<elem6>Content for element 6</elem6>
|
190
|
-
</elem5>
|
191
|
-
</elem3>
|
192
|
-
<foo>
|
193
|
-
<bar>1</bar>
|
194
|
-
<bar>2</bar>
|
195
|
-
<bar>3</bar>
|
196
|
-
<bar>4</bar>
|
197
|
-
<bar>5</bar>
|
198
|
-
<bar>6</bar>
|
199
|
-
<bar>7</bar>
|
200
|
-
<bar>8</bar>
|
201
|
-
<bar>9</bar>
|
202
|
-
<bar>10</bar>
|
203
|
-
</foo>
|
204
|
-
</root_node>
|
205
137
|
|
206
138
|
== DOCUMENTATION
|
139
|
+
For more information please refer to the documentation.
|
140
|
+
|
207
141
|
RDoc comments are included - run 'rake doc' to generate documentation.
|
208
142
|
You can find the latest documentation at:
|
209
143
|
|
210
144
|
* http://libxml.rubyforge.org/rdoc/
|
211
145
|
|
146
|
+
If you have any questions, please send email to libxml-devel@rubyforge.org.
|
147
|
+
|
212
148
|
== License
|
213
149
|
See LICENSE for license information.
|
214
150
|
|
215
|
-
== MORE INFORMATION
|
216
|
-
For more information please refer to the documentation. If you have any
|
217
|
-
questions, please send email to libxml-devel@rubyforge.org.
|
data/doc/css/normal.css
ADDED
@@ -0,0 +1,182 @@
|
|
1
|
+
/*
|
2
|
+
* Uncopyrighted 2005 Ross Bamford.
|
3
|
+
*
|
4
|
+
* rosco at roscopeco dot co dot uk
|
5
|
+
*/
|
6
|
+
body {
|
7
|
+
background: #ffffff;
|
8
|
+
color: #000000;
|
9
|
+
font-family: Microsoft sans-serif, sans-serif, arial, helvetica;
|
10
|
+
font-size: 12px;
|
11
|
+
padding: 0px;
|
12
|
+
margin: 0px 0px 0px 0px;
|
13
|
+
}
|
14
|
+
|
15
|
+
.container {
|
16
|
+
width: 800px;
|
17
|
+
margin: 0 auto;
|
18
|
+
}
|
19
|
+
|
20
|
+
/* ** links ** */
|
21
|
+
a {
|
22
|
+
color: red;
|
23
|
+
text-decoration: none;
|
24
|
+
}
|
25
|
+
|
26
|
+
a:hover {
|
27
|
+
color: red;
|
28
|
+
text-decoration: underline;
|
29
|
+
}
|
30
|
+
|
31
|
+
a:active {
|
32
|
+
color: red;
|
33
|
+
text-decoration: underline;
|
34
|
+
}
|
35
|
+
|
36
|
+
.navlinks a {
|
37
|
+
color: red;
|
38
|
+
text-decoration: none;
|
39
|
+
font-weight: bold;
|
40
|
+
}
|
41
|
+
|
42
|
+
.navlinks a:hover {
|
43
|
+
color: red;
|
44
|
+
text-decoration: underline;
|
45
|
+
font-weight: bold;
|
46
|
+
}
|
47
|
+
|
48
|
+
.navlinks {
|
49
|
+
padding: 10px;
|
50
|
+
background: white;
|
51
|
+
white-space: nowrap
|
52
|
+
}
|
53
|
+
|
54
|
+
div.copyright {
|
55
|
+
/* Copyright bit on pages */
|
56
|
+
color: #909090;
|
57
|
+
position: relative;
|
58
|
+
top: 5em;
|
59
|
+
right: 2%;
|
60
|
+
text-align: right;
|
61
|
+
font-size: 8pt;
|
62
|
+
}
|
63
|
+
|
64
|
+
/* * page styles *** */
|
65
|
+
|
66
|
+
h1.title {
|
67
|
+
font-size: 48px;
|
68
|
+
padding-left: 0;
|
69
|
+
}
|
70
|
+
|
71
|
+
h1 {
|
72
|
+
padding: 10px;
|
73
|
+
}
|
74
|
+
|
75
|
+
h2 {
|
76
|
+
border-bottom: thin #959595 solid;
|
77
|
+
}
|
78
|
+
|
79
|
+
h3 {
|
80
|
+
border-bottom: thin #b8c8c8 solid;
|
81
|
+
}
|
82
|
+
|
83
|
+
h5 {
|
84
|
+
border-bottom: thin #c0c0d8 solid;
|
85
|
+
}
|
86
|
+
|
87
|
+
div.note {
|
88
|
+
background: #e8e8fa;
|
89
|
+
border: thin dashed #3e5972;
|
90
|
+
position: relative;
|
91
|
+
width: 90%;
|
92
|
+
left: 5%;
|
93
|
+
right: 5%;
|
94
|
+
text-align: right;
|
95
|
+
font-size: 10pt;
|
96
|
+
padding: 5px;
|
97
|
+
margin-bottom: 5px;
|
98
|
+
}
|
99
|
+
|
100
|
+
/* * syntax ******** */
|
101
|
+
pre.ruby {
|
102
|
+
background: #f5f5f5;
|
103
|
+
border: thin dashed #3e5972;
|
104
|
+
padding: 10px;
|
105
|
+
margin-left: 2em;
|
106
|
+
}
|
107
|
+
|
108
|
+
pre.ruby span.normal {
|
109
|
+
color: #000000;
|
110
|
+
}
|
111
|
+
|
112
|
+
pre.ruby span.comment {
|
113
|
+
color: #789a86;
|
114
|
+
text-decoration: oblique;
|
115
|
+
}
|
116
|
+
|
117
|
+
pre.ruby span.ident {
|
118
|
+
color: #0b0202;
|
119
|
+
}
|
120
|
+
|
121
|
+
pre.ruby span.punct {
|
122
|
+
color: #8a7070;
|
123
|
+
}
|
124
|
+
|
125
|
+
pre.ruby span.symbol {
|
126
|
+
color: #aa1010;
|
127
|
+
font-weight: bold;
|
128
|
+
}
|
129
|
+
|
130
|
+
pre.ruby span.keyword {
|
131
|
+
color: #903030;
|
132
|
+
font-weight: bold;
|
133
|
+
}
|
134
|
+
|
135
|
+
pre.ruby span.constant {
|
136
|
+
color: #3e5972;
|
137
|
+
font-weight: bold;
|
138
|
+
}
|
139
|
+
|
140
|
+
pre.ruby span.string {
|
141
|
+
color: #2020f0;
|
142
|
+
}
|
143
|
+
|
144
|
+
pre.ruby span.char {
|
145
|
+
color: #2020f0;
|
146
|
+
font-weight: bold;
|
147
|
+
}
|
148
|
+
|
149
|
+
pre.ruby span.number {
|
150
|
+
color: #aa1010;
|
151
|
+
}
|
152
|
+
|
153
|
+
pre.ruby span.regex {
|
154
|
+
color: #552090;
|
155
|
+
}
|
156
|
+
|
157
|
+
pre.ruby span.expr {
|
158
|
+
color: #101080;
|
159
|
+
font-weight: bold;
|
160
|
+
}
|
161
|
+
|
162
|
+
pre.ruby span.global {
|
163
|
+
color: #557462;
|
164
|
+
}
|
165
|
+
|
166
|
+
pre.ruby span.class {
|
167
|
+
color: #3e5972;
|
168
|
+
font-weight: bold;
|
169
|
+
}
|
170
|
+
|
171
|
+
pre.ruby span.method {
|
172
|
+
color: #aa1010;
|
173
|
+
}
|
174
|
+
|
175
|
+
pre.ruby span.attribute {
|
176
|
+
color: #3e5972;
|
177
|
+
}
|
178
|
+
|
179
|
+
pre.ruby span.escape {
|
180
|
+
color: #2020f0;
|
181
|
+
font-weight: bold;
|
182
|
+
}
|
Binary file
|
Binary file
|
Binary file
|
data/doc/index.xml
ADDED
@@ -0,0 +1,43 @@
|
|
1
|
+
<?xml version="1.0" encoding="ISO-8859-1" ?>
|
2
|
+
<?xml-stylesheet href="layout.xsl" type="text/xsl" ?>
|
3
|
+
|
4
|
+
<content>
|
5
|
+
|
6
|
+
<h2> Welcome to LibXml Ruby </h2>
|
7
|
+
|
8
|
+
<p>The <span style="color: red;">Libxml-Ruby</span> project provides Ruby
|
9
|
+
language bindings for the <a href="http://xmlsoft.org">GNOME Libxml2 XML toolkit</a>.
|
10
|
+
It is free software, released under the <a href="license.xml">MIT License</a>.</p>
|
11
|
+
|
12
|
+
<p>Libxml-ruby's primary advantage over REXML is performance - if speed is your need,
|
13
|
+
these are good libraries to consider, as demonstrated by the informal benchmark below.</p>
|
14
|
+
|
15
|
+
<table border="1" style="border: 1px solid red; margin: 30px;">
|
16
|
+
<tr><td colspan="3"><b>Speed Comparison libxml vs. rexml</b></td></tr>
|
17
|
+
<tr><th> in seconds </th><th> libxml </th><th> rexml </th></tr>
|
18
|
+
<tr><td> opening </td><td> 0.003954 </td><td> 0.104750 </td></tr>
|
19
|
+
<tr><td> attribute_add </td><td> 0.001895 </td><td> 0.011114 </td></tr>
|
20
|
+
<tr><td> subelems </td><td> 0.000585 </td><td> 0.004729 </td></tr>
|
21
|
+
<tr><td> xpath </td><td> 0.013269 </td><td> 2.981499 </td></tr>
|
22
|
+
</table>
|
23
|
+
|
24
|
+
<h2>Download</h2>
|
25
|
+
|
26
|
+
<p>You can find the latest release at:</p>
|
27
|
+
|
28
|
+
<pre>
|
29
|
+
<a href="http://rubyforge.org/frs/?group_id=494">http://rubyforge.org/frs/?group_id=494</a>
|
30
|
+
</pre>
|
31
|
+
|
32
|
+
<p>Libxml-Ruby is also available for installation via <a href="http://rubygems.rubyforge.org">Rubygems</a>
|
33
|
+
-- see the <a href="install.xml">installation page</a> for details.</p>
|
34
|
+
|
35
|
+
<h2> Project Status </h2>
|
36
|
+
|
37
|
+
<p>The code has now been updated to work with Ruby 1.8, and is compiling cleanly
|
38
|
+
and working well with GCC 4.x. We still have a number of open bugs to address,
|
39
|
+
which is being done as we work toward a 0.4.0 release and the library is
|
40
|
+
generally fairly stable in use.</p>
|
41
|
+
|
42
|
+
</content>
|
43
|
+
|