libxml-ruby 0.9.3 → 0.9.4

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGES CHANGED
@@ -1,5 +1,14 @@
1
1
  (See log/ChangeLog for more detailed changes derived directly from source control.)
2
2
 
3
+ == 0.9.4 / 2008-11-24 Charlie Savage
4
+
5
+ * Update HTML parser so that it can read files, strings and io
6
+ streams.
7
+
8
+ * Update HTML parser to support user specified encodings.
9
+
10
+ * Additional C code cleanup.
11
+
3
12
  == 0.9.3 / 2008-11-22 Charlie Savage
4
13
 
5
14
  * Fixed segementation fault caused by documents being freed
data/README CHANGED
@@ -4,12 +4,11 @@
4
4
  The libxml gem provides Ruby language bindings for GNOME's Libxml2
5
5
  XML toolkit. It is free software, released under the MIT License.
6
6
 
7
- libxml-ruby provides several advantages over REXML:
8
-
9
- * Speed - libxml is many times faster than REXML
10
- * Features - libxml provides a number of additional features over REXML, including XML Schema Validation, RelaxNg validation, xslt (see libxslt-ruby)
11
- * Conformance - libxml passes all 1800+ tests from the OASIS XML Tests Suite
7
+ We think libxml-ruby is the best XML library for Ruby because:
12
8
 
9
+ * Speed - Its much faster than REXML and Hpricot
10
+ * Features - It provides an amazing number of featues
11
+ * Conformance - It passes all 1800+ tests from the OASIS XML Tests Suite
13
12
 
14
13
  == Requirements
15
14
  libxml-ruby requires Ruby 1.8.4 or higher. It is dependent on
@@ -41,58 +40,56 @@ libxml2 and iconv. The gem also includes a Microsoft VC++ 2008
41
40
  solution. If you wish to run a debug version of libxml-ruby on
42
41
  Windows, then it is highly recommended you use VC++.
43
42
 
44
- == Functionality
45
- LibXML is a highly conformant XML parser, passing all 1800+ tests
46
- from the OASIS XML Tests Suite. It includes rich functionality such as:
43
+ == Getting Started
44
+ Using libxml is easy. First decide what parser you want to use:
47
45
 
48
- * DOM (LibXML::XML::Parser)
49
- * SAX (LibXML::XML::SaxParser)
50
- * HTML Parsing (LibXML::XML::HTMLParser)
51
- * Reader (LibXML::XML::Reader)
52
- * XPath (LibXML::XML::XPath)
53
- * XPointer (LibXML::XML::XPointer)
54
- * DTDs (LibXML::XML::Dtd)
55
- * RelaxNG Schemas (LibXML::XML::RelaxNG)
56
- * XML Schema (LibXML::XML::Schema)
57
- * XSLT (http://rubyforge.org/projects/libxsl/)
46
+ * Generally you'll want to use the XML::Parser which provides
47
+ a tree based API
48
+ * For large documents that won't fit into memory, or if you
49
+ prefer an input based API, then use XML::Reader
50
+ * If you are parsing HTML files, then use XML::HTMLParser
51
+ * If you are masochistic or old stream, then use the XML::SaxParser
52
+ which provides a callback API.
58
53
 
59
- libxml-ruby provides impressive coverage of libxml's functionality
60
- through an easy-to-use C api.
54
+ Once you choose a parser, then choose a datasource and its
55
+ encoding. Libxml can parse files, strings, URIs and IO stream.
56
+ For more information, see XML::Input.
61
57
 
62
- == Performance
63
- In addition to being feature rich and conformation, the main reason
64
- people use libxml-ruby is for performance. Here are the results
65
- of a couple simple benchmarks recently blogged about on the
66
- Web (you can find them in the benchmark directory of the
67
- libxml distribution).
58
+ == Advanced Functionality
59
+ Beyond the basics of parsing and processing XML and HTML documents,
60
+ lLibxml provides a wealth of additional functionality.
68
61
 
69
- From http://depixelate.com/2008/4/23/ruby-xml-parsing-benchmarks
62
+ Most commonly, you'll want to use its XML::XPath support, which makes
63
+ it easy to search for data inside and XML document. Although not as
64
+ popular, XML::XPointer provides another API for finding data inside
65
+ an XML document.
70
66
 
71
- user system total real
72
- libxml 0.032000 0.000000 0.032000 ( 0.031000)
73
- Hpricot 0.640000 0.031000 0.671000 ( 0.890000)
74
- REXML 1.813000 0.047000 1.860000 ( 2.031000)
67
+ Often times you'll need to validate data before processing it. For example,
68
+ if you accept user generated content submitted over the Web, you'll
69
+ want to first verify it does not contain malicious code such as embedded scripts.
70
+ This can be done using libxml's powerful set of validators:
75
71
 
76
- From https://svn.concord.org/svn/projects/trunk/common/ruby/xml_benchmarks/
72
+ * DTDs (LibXML::XML::Dtd)
73
+ * Relax Schemas (LibXML::XML::RelaxNG)
74
+ * XML Schema (LibXML::XML::Schema)
77
75
 
78
- user system total real
79
- libxml 0.641000 0.031000 0.672000 ( 0.672000)
80
- hpricot 5.359000 0.062000 5.421000 ( 5.516000)
81
- rexml 22.859000 0.047000 22.906000 ( 23.203000)
76
+ Finally, if you'd like to use XSL Transformations to process data,
77
+ then also install the libxslt gem which is available at
78
+ http://rubyforge.org/projects/libxsl/.
82
79
 
83
- == USAGE
80
+ == Usage
84
81
  For in-depth information about using libxml-ruby please refer
85
- to its online Rdoc documentation.
82
+ to its online Rdoc documentation.
86
83
 
87
- All libxml classes are in the LibXML::XML module. The most
88
- expedient way to use libxml is to require 'xml'. This will mixin
89
- the LibXML module into the global namespace, allowing you to
90
- write code like this:
84
+ All libxml classes are in the LibXML::XML module. The easiest
85
+ way to use libxml is to require 'xml'. This will mixin
86
+ the LibXML module into the global namespace, allowing you to
87
+ write code like this:
91
88
 
92
89
  require 'xml'
93
90
  document = XML::Document.new
94
91
 
95
- However, when creating an application or library you plan to
92
+ However, when creating an application or library you plan to
96
93
  redistribute, it is best to not add the LibXML module to the global
97
94
  namespace, in which case you can either write your code like this:
98
95
 
@@ -114,104 +111,40 @@ and include LibXML into it. For example:
114
111
  end
115
112
  end
116
113
 
117
- For simplicity's sake we will use require 'xml in the basic examples
118
- shown below.
114
+ For simplicity's sake, the documentation uses the xml module in its examples.
119
115
 
120
- === READING
121
- There are several ways to read xml documents.
116
+ == Performance
117
+ In addition to being feature rich and conformation, the main reason
118
+ people use libxml-ruby is for performance. Here are the results
119
+ of a couple simple benchmarks recently blogged about on the
120
+ Web (you can find them in the benchmark directory of the
121
+ libxml distribution).
122
122
 
123
- require 'xml'
124
- doc = XML::Document.file('output.xml')
125
- root = doc.root
126
-
127
- puts "Root element name: #{root.name}"
128
-
129
- elem3 = root.find('elem3').to_a.first
130
- puts "Elem3: #{elem3['attr']}"
131
-
132
- doc.find('//root_node/foo/bar').each do |node|
133
- puts "Node path: #{node.path} \t Contents: #{node.content}"
134
- end
123
+ From http://depixelate.com/2008/4/23/ruby-xml-parsing-benchmarks
135
124
 
136
- And your terminal should look like this:
125
+ user system total real
126
+ libxml 0.032000 0.000000 0.032000 ( 0.031000)
127
+ Hpricot 0.640000 0.031000 0.671000 ( 0.890000)
128
+ REXML 1.813000 0.047000 1.860000 ( 2.031000)
137
129
 
138
- Root element name: root_node
139
- Elem3: baz
140
- Node path: /root_node/foo/bar[1] Contents: 1
141
- Node path: /root_node/foo/bar[2] Contents: 2
142
- Node path: /root_node/foo/bar[3] Contents: 3
143
- Node path: /root_node/foo/bar[4] Contents: 4
144
- Node path: /root_node/foo/bar[5] Contents: 5
145
- Node path: /root_node/foo/bar[6] Contents: 6
146
- Node path: /root_node/foo/bar[7] Contents: 7
147
- Node path: /root_node/foo/bar[8] Contents: 8
148
- Node path: /root_node/foo/bar[9] Contents: 9
149
- Node path: /root_node/foo/bar[10] Contents: 10
130
+ From https://svn.concord.org/svn/projects/trunk/common/ruby/xml_benchmarks/
150
131
 
151
- === WRITING
152
- To write a simple document:
132
+ user system total real
133
+ libxml 0.641000 0.031000 0.672000 ( 0.672000)
134
+ hpricot 5.359000 0.062000 5.421000 ( 5.516000)
135
+ rexml 22.859000 0.047000 22.906000 ( 23.203000)
153
136
 
154
- require 'xml'
155
-
156
- doc = XML::Document.new()
157
- doc.root = XML::Node.new('root_node')
158
- root = doc.root
159
-
160
- root << elem1 = XML::Node.new('elem1')
161
- elem1['attr1'] = 'val1'
162
- elem1['attr2'] = 'val2'
163
-
164
- root << elem2 = XML::Node.new('elem2')
165
- elem2['attr1'] = 'val1'
166
- elem2['attr2'] = 'val2'
167
-
168
- root << elem3 = XML::Node.new('elem3')
169
- elem3 << elem4 = XML::Node.new('elem4')
170
- elem3 << elem5 = XML::Node.new('elem5')
171
-
172
- elem5 << elem6 = XML::Node.new('elem6')
173
- elem6 << 'Content for element 6'
174
-
175
- elem3['attr'] = 'baz'
176
-
177
- format = true
178
- doc.save('output.xml', format)
179
-
180
- The file output.xml contains:
181
-
182
- <?xml version="1.0"?>
183
- <root_node>
184
- <elem1 attr1="val1" attr2="val2"/>
185
- <elem2 attr1="val1" attr2="val2"/>
186
- <elem3 attr="baz">
187
- <elem4/>
188
- <elem5>
189
- <elem6>Content for element 6</elem6>
190
- </elem5>
191
- </elem3>
192
- <foo>
193
- <bar>1</bar>
194
- <bar>2</bar>
195
- <bar>3</bar>
196
- <bar>4</bar>
197
- <bar>5</bar>
198
- <bar>6</bar>
199
- <bar>7</bar>
200
- <bar>8</bar>
201
- <bar>9</bar>
202
- <bar>10</bar>
203
- </foo>
204
- </root_node>
205
137
 
206
138
  == DOCUMENTATION
139
+ For more information please refer to the documentation.
140
+
207
141
  RDoc comments are included - run 'rake doc' to generate documentation.
208
142
  You can find the latest documentation at:
209
143
 
210
144
  * http://libxml.rubyforge.org/rdoc/
211
145
 
146
+ If you have any questions, please send email to libxml-devel@rubyforge.org.
147
+
212
148
  == License
213
149
  See LICENSE for license information.
214
150
 
215
- == MORE INFORMATION
216
- For more information please refer to the documentation. If you have any
217
- questions, please send email to libxml-devel@rubyforge.org.
@@ -0,0 +1,182 @@
1
+ /*
2
+ * Uncopyrighted 2005 Ross Bamford.
3
+ *
4
+ * rosco at roscopeco dot co dot uk
5
+ */
6
+ body {
7
+ background: #ffffff;
8
+ color: #000000;
9
+ font-family: Microsoft sans-serif, sans-serif, arial, helvetica;
10
+ font-size: 12px;
11
+ padding: 0px;
12
+ margin: 0px 0px 0px 0px;
13
+ }
14
+
15
+ .container {
16
+ width: 800px;
17
+ margin: 0 auto;
18
+ }
19
+
20
+ /* ** links ** */
21
+ a {
22
+ color: red;
23
+ text-decoration: none;
24
+ }
25
+
26
+ a:hover {
27
+ color: red;
28
+ text-decoration: underline;
29
+ }
30
+
31
+ a:active {
32
+ color: red;
33
+ text-decoration: underline;
34
+ }
35
+
36
+ .navlinks a {
37
+ color: red;
38
+ text-decoration: none;
39
+ font-weight: bold;
40
+ }
41
+
42
+ .navlinks a:hover {
43
+ color: red;
44
+ text-decoration: underline;
45
+ font-weight: bold;
46
+ }
47
+
48
+ .navlinks {
49
+ padding: 10px;
50
+ background: white;
51
+ white-space: nowrap
52
+ }
53
+
54
+ div.copyright {
55
+ /* Copyright bit on pages */
56
+ color: #909090;
57
+ position: relative;
58
+ top: 5em;
59
+ right: 2%;
60
+ text-align: right;
61
+ font-size: 8pt;
62
+ }
63
+
64
+ /* * page styles *** */
65
+
66
+ h1.title {
67
+ font-size: 48px;
68
+ padding-left: 0;
69
+ }
70
+
71
+ h1 {
72
+ padding: 10px;
73
+ }
74
+
75
+ h2 {
76
+ border-bottom: thin #959595 solid;
77
+ }
78
+
79
+ h3 {
80
+ border-bottom: thin #b8c8c8 solid;
81
+ }
82
+
83
+ h5 {
84
+ border-bottom: thin #c0c0d8 solid;
85
+ }
86
+
87
+ div.note {
88
+ background: #e8e8fa;
89
+ border: thin dashed #3e5972;
90
+ position: relative;
91
+ width: 90%;
92
+ left: 5%;
93
+ right: 5%;
94
+ text-align: right;
95
+ font-size: 10pt;
96
+ padding: 5px;
97
+ margin-bottom: 5px;
98
+ }
99
+
100
+ /* * syntax ******** */
101
+ pre.ruby {
102
+ background: #f5f5f5;
103
+ border: thin dashed #3e5972;
104
+ padding: 10px;
105
+ margin-left: 2em;
106
+ }
107
+
108
+ pre.ruby span.normal {
109
+ color: #000000;
110
+ }
111
+
112
+ pre.ruby span.comment {
113
+ color: #789a86;
114
+ text-decoration: oblique;
115
+ }
116
+
117
+ pre.ruby span.ident {
118
+ color: #0b0202;
119
+ }
120
+
121
+ pre.ruby span.punct {
122
+ color: #8a7070;
123
+ }
124
+
125
+ pre.ruby span.symbol {
126
+ color: #aa1010;
127
+ font-weight: bold;
128
+ }
129
+
130
+ pre.ruby span.keyword {
131
+ color: #903030;
132
+ font-weight: bold;
133
+ }
134
+
135
+ pre.ruby span.constant {
136
+ color: #3e5972;
137
+ font-weight: bold;
138
+ }
139
+
140
+ pre.ruby span.string {
141
+ color: #2020f0;
142
+ }
143
+
144
+ pre.ruby span.char {
145
+ color: #2020f0;
146
+ font-weight: bold;
147
+ }
148
+
149
+ pre.ruby span.number {
150
+ color: #aa1010;
151
+ }
152
+
153
+ pre.ruby span.regex {
154
+ color: #552090;
155
+ }
156
+
157
+ pre.ruby span.expr {
158
+ color: #101080;
159
+ font-weight: bold;
160
+ }
161
+
162
+ pre.ruby span.global {
163
+ color: #557462;
164
+ }
165
+
166
+ pre.ruby span.class {
167
+ color: #3e5972;
168
+ font-weight: bold;
169
+ }
170
+
171
+ pre.ruby span.method {
172
+ color: #aa1010;
173
+ }
174
+
175
+ pre.ruby span.attribute {
176
+ color: #3e5972;
177
+ }
178
+
179
+ pre.ruby span.escape {
180
+ color: #2020f0;
181
+ font-weight: bold;
182
+ }
Binary file
Binary file
Binary file
data/doc/index.xml ADDED
@@ -0,0 +1,43 @@
1
+ <?xml version="1.0" encoding="ISO-8859-1" ?>
2
+ <?xml-stylesheet href="layout.xsl" type="text/xsl" ?>
3
+
4
+ <content>
5
+
6
+ <h2> Welcome to LibXml Ruby </h2>
7
+
8
+ <p>The <span style="color: red;">Libxml-Ruby</span> project provides Ruby
9
+ language bindings for the <a href="http://xmlsoft.org">GNOME Libxml2 XML toolkit</a>.
10
+ It is free software, released under the <a href="license.xml">MIT License</a>.</p>
11
+
12
+ <p>Libxml-ruby's primary advantage over REXML is performance - if speed is your need,
13
+ these are good libraries to consider, as demonstrated by the informal benchmark below.</p>
14
+
15
+ <table border="1" style="border: 1px solid red; margin: 30px;">
16
+ <tr><td colspan="3"><b>Speed Comparison libxml vs. rexml</b></td></tr>
17
+ <tr><th> in seconds </th><th> libxml </th><th> rexml </th></tr>
18
+ <tr><td> opening </td><td> 0.003954 </td><td> 0.104750 </td></tr>
19
+ <tr><td> attribute_add </td><td> 0.001895 </td><td> 0.011114 </td></tr>
20
+ <tr><td> subelems </td><td> 0.000585 </td><td> 0.004729 </td></tr>
21
+ <tr><td> xpath </td><td> 0.013269 </td><td> 2.981499 </td></tr>
22
+ </table>
23
+
24
+ <h2>Download</h2>
25
+
26
+ <p>You can find the latest release at:</p>
27
+
28
+ <pre>
29
+ <a href="http://rubyforge.org/frs/?group_id=494">http://rubyforge.org/frs/?group_id=494</a>
30
+ </pre>
31
+
32
+ <p>Libxml-Ruby is also available for installation via <a href="http://rubygems.rubyforge.org">Rubygems</a>
33
+ -- see the <a href="install.xml">installation page</a> for details.</p>
34
+
35
+ <h2> Project Status </h2>
36
+
37
+ <p>The code has now been updated to work with Ruby 1.8, and is compiling cleanly
38
+ and working well with GCC 4.x. We still have a number of open bugs to address,
39
+ which is being done as we work toward a 0.4.0 release and the library is
40
+ generally fairly stable in use.</p>
41
+
42
+ </content>
43
+