hpricot 0.8.2 → 0.8.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/CHANGELOG +19 -3
- data/{README → README.md} +84 -83
- data/Rakefile +85 -123
- data/ext/fast_xs/FastXsService.java +94 -1
- data/ext/fast_xs/fast_xs.c +60 -51
- data/ext/hpricot_scan/HpricotCss.java +50 -31
- data/ext/hpricot_scan/HpricotScanService.java +77 -64
- data/ext/hpricot_scan/extconf.rb +3 -0
- data/ext/hpricot_scan/hpricot_css.c +331 -323
- data/ext/hpricot_scan/hpricot_css.rl +6 -1
- data/ext/hpricot_scan/hpricot_scan.c +922 -810
- data/ext/hpricot_scan/hpricot_scan.java.rl +22 -13
- data/ext/hpricot_scan/hpricot_scan.rl +198 -90
- data/extras/hpricot.png +0 -0
- data/lib/hpricot/builder.rb +1 -1
- data/lib/hpricot/elements.rb +28 -24
- data/lib/hpricot/htmlinfo.rb +1 -1
- data/lib/hpricot/tag.rb +2 -2
- data/lib/hpricot/tags.rb +4 -4
- data/lib/hpricot/traverse.rb +25 -25
- data/lib/hpricot/xchar.rb +3 -3
- data/test/files/basic.xhtml +1 -1
- data/test/test_parser.rb +31 -2
- metadata +21 -9
- data/extras/mingw-rbconfig.rb +0 -176
data/CHANGELOG
CHANGED
@@ -1,16 +1,32 @@
|
|
1
|
+
= 0.8.3
|
2
|
+
=== 3 November, 2010
|
3
|
+
* GH#8: Nil-check before downcasing attribute key
|
4
|
+
* GH#25: Proper ruby 1.9 encoding support
|
5
|
+
* GH#28. Use integers instead of ?? on 1.9, which is just a string.
|
6
|
+
* including noscript to ElementInclusions , so that hpricot wont fail
|
7
|
+
when trying to parse a meta tag inside head section when noscript is
|
8
|
+
present.
|
9
|
+
* latest changes from fast_xs mainline
|
10
|
+
* Fixes to get Hpricot running on Rubinius:
|
11
|
+
* Use free, not XFREE
|
12
|
+
* Remove RSTRUCT craziness, don't break Array#at
|
13
|
+
|
1
14
|
= 0.8.2
|
2
15
|
=== 5 November, 2009
|
3
16
|
* Bring JRuby support up to speed, including Java-based hpricot_css support
|
4
17
|
* Change JRuby fast_xs to have same escaping behavior as C fast_xs
|
5
|
-
|
6
|
-
= 0.8.1
|
7
|
-
=== 3 April, 2009
|
8
18
|
* fix for issue #2, downcasing of html attributes inside the parser.
|
9
19
|
* solve issue #3 with bogus etags being preserved in `to_s` rather than just `to_original_html`.
|
10
20
|
* fix error when attempting to reparent cleared node. (issue #5)
|
11
21
|
* Hpricot::Attributes proxy object for using `ele.attributes[k] = v` directly.
|
12
22
|
however, it is preferred to use the jquery-like `elements.attr(k, v)`.
|
13
23
|
|
24
|
+
= 0.8.1
|
25
|
+
=== 3 April, 2009
|
26
|
+
* big problems on Ruby 1.8.6, use INT2FIX instead of INT2NUM. hashes were being cast to bignums.
|
27
|
+
* patch for 1.8.5 to define RARRAY_PTR. thanks, mike perham!
|
28
|
+
* inspecting empty document bug, courtesy of @TalLevAmi.
|
29
|
+
|
14
30
|
= 0.8
|
15
31
|
=== 31st March, 2009
|
16
32
|
* Saving memory and speed by using RStruct-based elements in the C extension.
|
data/{README → README.md}
RENAMED
@@ -1,4 +1,4 @@
|
|
1
|
-
|
1
|
+
# Hpricot, Read Any HTML
|
2
2
|
|
3
3
|
Hpricot is a fast, flexible HTML parser written in C. It's designed to be very
|
4
4
|
accommodating (like Tanaka Akira's HTree) and to have a very helpful library
|
@@ -13,21 +13,21 @@ thing.
|
|
13
13
|
*Please read this entire document* before making assumptions about how this
|
14
14
|
software works.
|
15
15
|
|
16
|
-
|
16
|
+
## An Overview
|
17
17
|
|
18
18
|
Let's clear up what Hpricot is.
|
19
19
|
|
20
|
-
|
21
|
-
|
20
|
+
* Hpricot is *a standalone library*. It requires no other libraries. Just Ruby!
|
21
|
+
* While priding itself on speed, Hpricot *works hard to sort out bad HTML* and
|
22
22
|
pays a small penalty in order to get that right. So that's slightly more important
|
23
23
|
to me than speed.
|
24
|
-
|
24
|
+
* *If you can see it in Firefox, then Hpricot should parse it.* That's
|
25
25
|
how it should be! Let me know the minute it's otherwise.
|
26
|
-
|
26
|
+
* Primarily, Hpricot is used for reading HTML and tries to sort out troubled
|
27
27
|
HTML by having some idea of what good HTML is. Some people still like to use
|
28
28
|
Hpricot for XML reading, but *remember to use the Hpricot::XML() method* for that!
|
29
29
|
|
30
|
-
|
30
|
+
## The Hpricot Kingdom
|
31
31
|
|
32
32
|
First, here are all the links you need to know:
|
33
33
|
|
@@ -43,57 +43,57 @@ not going to say "Use at your own risk" because I don't want this library to be
|
|
43
43
|
risky. If you trip on something, I'll share the liability by repairing things
|
44
44
|
as quickly as I can. Your responsibility is to report the inadequacies.
|
45
45
|
|
46
|
-
|
46
|
+
## Installing Hpricot
|
47
47
|
|
48
48
|
You may get the latest stable version from Rubyforge. Win32 binaries,
|
49
49
|
Java binaries (for JRuby), and source gems are available.
|
50
50
|
|
51
|
-
|
51
|
+
$ gem install hpricot
|
52
52
|
|
53
|
-
|
53
|
+
## An Hpricot Showcase
|
54
54
|
|
55
55
|
We're going to run through a big pile of examples to get you jump-started.
|
56
56
|
Many of these examples are also found at
|
57
57
|
http://wiki.github.com/hpricot/hpricot/hpricot-basics, in case you
|
58
58
|
want to add some of your own.
|
59
59
|
|
60
|
-
|
60
|
+
### Loading Hpricot Itself
|
61
61
|
|
62
62
|
You have probably got the gem, right? To load Hpricot:
|
63
63
|
|
64
|
-
|
65
|
-
|
64
|
+
require 'rubygems'
|
65
|
+
require 'hpricot'
|
66
66
|
|
67
67
|
If you've installed the plain source distribution, go ahead and just:
|
68
68
|
|
69
|
-
|
69
|
+
require 'hpricot'
|
70
70
|
|
71
|
-
|
71
|
+
### Load an HTML Page
|
72
72
|
|
73
73
|
The <tt>Hpricot()</tt> method takes a string or any IO object and loads the
|
74
74
|
contents into a document object.
|
75
75
|
|
76
|
-
|
76
|
+
doc = Hpricot("<p>A simple <b>test</b> string.</p>")
|
77
77
|
|
78
78
|
To load from a file, just get the stream open:
|
79
79
|
|
80
|
-
|
80
|
+
doc = open("index.html") { |f| Hpricot(f) }
|
81
81
|
|
82
82
|
To load from a web URL, use <tt>open-uri</tt>, which comes with Ruby:
|
83
83
|
|
84
|
-
|
85
|
-
|
84
|
+
require 'open-uri'
|
85
|
+
doc = open("http://qwantz.com/") { |f| Hpricot(f) }
|
86
86
|
|
87
87
|
Hpricot uses an internal buffer to parse the file, so the IO will stream
|
88
88
|
properly and large documents won't be loaded into memory all at once. However,
|
89
89
|
the parsed document object will be present in memory, in its entirety.
|
90
90
|
|
91
|
-
|
91
|
+
### Search for Elements
|
92
92
|
|
93
93
|
Use <tt>Doc.search</tt>:
|
94
94
|
|
95
|
-
|
96
|
-
|
95
|
+
doc.search("//p[@class='posted']")
|
96
|
+
#=> #<Hpricot:Elements[{p ...}, {p ...}]>
|
97
97
|
|
98
98
|
<tt>Doc.search</tt> can take an XPath or CSS expression. In the above example,
|
99
99
|
all paragraph <tt><p></tt> elements are grabbed which have a <tt>class</tt>
|
@@ -101,126 +101,127 @@ attribute of <tt>"posted"</tt>.
|
|
101
101
|
|
102
102
|
A shortcut is to use the divisor:
|
103
103
|
|
104
|
-
|
105
|
-
|
104
|
+
(doc/"p.posted")
|
105
|
+
#=> #<Hpricot:Elements[{p ...}, {p ...}]>
|
106
106
|
|
107
|
-
|
107
|
+
### Finding Just One Element
|
108
108
|
|
109
109
|
If you're looking for a single element, the <tt>at</tt> method will return the
|
110
110
|
first element matched by the expression. In this case, you'll get back the
|
111
111
|
element itself rather than the <tt>Hpricot::Elements</tt> array.
|
112
112
|
|
113
|
-
|
113
|
+
doc.at("body")['onload']
|
114
114
|
|
115
115
|
The above code will find the body tag and give you back the <tt>onload</tt>
|
116
116
|
attribute. This is the most common reason to use the element directly: when
|
117
117
|
reading and writing HTML attributes.
|
118
118
|
|
119
|
-
|
119
|
+
### Fetching the Contents of an Element
|
120
120
|
|
121
121
|
Just as with browser scripting, the <tt>inner_html</tt> property can be used to
|
122
122
|
get the inner contents of an element.
|
123
123
|
|
124
|
-
|
125
|
-
|
124
|
+
(doc/"#elementID").inner_html
|
125
|
+
#=> "..contents.."
|
126
126
|
|
127
127
|
If your expression matches more than one element, you'll get back the contents
|
128
128
|
of ''all the matched elements''. So you may want to use <tt>first</tt> to be
|
129
129
|
sure you get back only one.
|
130
130
|
|
131
|
-
|
132
|
-
|
131
|
+
(doc/"#elementID").first.inner_html
|
132
|
+
#=> "..contents.."
|
133
133
|
|
134
|
-
|
134
|
+
### Fetching the HTML for an Element
|
135
135
|
|
136
136
|
If you want the HTML for the whole element (not just the contents), use
|
137
137
|
<tt>to_html</tt>:
|
138
138
|
|
139
|
-
|
140
|
-
|
139
|
+
(doc/"#elementID").to_html
|
140
|
+
#=> "<div id='elementID'>...</div>"
|
141
141
|
|
142
|
-
|
142
|
+
### Looping
|
143
143
|
|
144
144
|
All searches return a set of <tt>Hpricot::Elements</tt>. Go ahead and loop
|
145
145
|
through them like you would an array.
|
146
146
|
|
147
|
-
|
148
|
-
|
149
|
-
|
147
|
+
(doc/"p/a/img").each do |img|
|
148
|
+
puts img.attributes['class']
|
149
|
+
end
|
150
150
|
|
151
|
-
|
151
|
+
### Continuing Searches
|
152
152
|
|
153
153
|
Searches can be continued from a collection of elements, in order to search deeper.
|
154
154
|
|
155
|
-
|
156
|
-
|
157
|
-
|
158
|
-
|
159
|
-
|
155
|
+
# find all paragraphs.
|
156
|
+
elements = doc.search("/html/body//p")
|
157
|
+
# continue the search by finding any images within those paragraphs.
|
158
|
+
(elements/"img")
|
159
|
+
#=> #<Hpricot::Elements[{img ...}, {img ...}]>
|
160
160
|
|
161
161
|
Searches can also be continued by searching within container elements.
|
162
162
|
|
163
|
-
|
164
|
-
|
165
|
-
|
166
|
-
|
163
|
+
# find all images within paragraphs.
|
164
|
+
doc.search("/html/body//p").each do |para|
|
165
|
+
puts "== Found a paragraph =="
|
166
|
+
pp para
|
167
167
|
|
168
|
-
|
169
|
-
|
170
|
-
|
171
|
-
|
172
|
-
|
168
|
+
imgs = para.search("img")
|
169
|
+
if imgs.any?
|
170
|
+
puts "== Found #{imgs.length} images inside =="
|
171
|
+
end
|
172
|
+
end
|
173
173
|
|
174
174
|
Of course, the most succinct ways to do the above are using CSS or XPath.
|
175
175
|
|
176
|
-
|
177
|
-
|
178
|
-
|
179
|
-
|
180
|
-
|
181
|
-
|
176
|
+
# the xpath version
|
177
|
+
(doc/"/html/body//p//img")
|
178
|
+
# the css version
|
179
|
+
(doc/"html > body > p img")
|
180
|
+
# ..or symbols work, too!
|
181
|
+
(doc/:html/:body/:p/:img)
|
182
182
|
|
183
|
-
|
183
|
+
### Looping Edits
|
184
184
|
|
185
185
|
You may certainly edit objects from within your search loops. Then, when you
|
186
186
|
spit out the HTML, the altered elements will show.
|
187
187
|
|
188
|
-
|
189
|
-
|
190
|
-
|
191
|
-
|
188
|
+
|
189
|
+
(doc/"span.entryPermalink").each do |span|
|
190
|
+
span.attributes['class'] = 'newLinks'
|
191
|
+
end
|
192
|
+
puts doc
|
192
193
|
|
193
194
|
This changes all <tt>span.entryPermalink</tt> elements to
|
194
195
|
<tt>span.newLinks</tt>. Keep in mind that there are often more convenient ways
|
195
196
|
of doing this. Such as the <tt>set</tt> method:
|
196
197
|
|
197
|
-
|
198
|
+
(doc/"span.entryPermalink").set(:class => 'newLinks')
|
198
199
|
|
199
|
-
|
200
|
+
### Figuring Out Paths
|
200
201
|
|
201
202
|
Every element can tell you its unique path (either XPath or CSS) to get to the
|
202
203
|
element from the root tag.
|
203
204
|
|
204
205
|
The <tt>css_path</tt> method:
|
205
206
|
|
206
|
-
|
207
|
-
|
208
|
-
|
209
|
-
|
207
|
+
doc.at("div > div:nth(1)").css_path
|
208
|
+
#=> "div > div:nth(1)"
|
209
|
+
doc.at("#header").css_path
|
210
|
+
#=> "#header"
|
210
211
|
|
211
212
|
Or, the <tt>xpath</tt> method:
|
212
213
|
|
213
|
-
|
214
|
-
|
215
|
-
|
216
|
-
|
214
|
+
doc.at("div > div:nth(1)").xpath
|
215
|
+
#=> "/div/div:eq(1)"
|
216
|
+
doc.at("#header").xpath
|
217
|
+
#=> "//div[@id='header']"
|
217
218
|
|
218
|
-
|
219
|
+
## Hpricot Fixups
|
219
220
|
|
220
221
|
When loading HTML documents, you have a few settings that can make Hpricot more
|
221
222
|
or less intense about how it gets involved.
|
222
223
|
|
223
|
-
|
224
|
+
## :fixup_tags
|
224
225
|
|
225
226
|
Really, there are so many ways to clean up HTML and your intentions may be to
|
226
227
|
keep the HTML as-is. So Hpricot's default behavior is to keep things flexible.
|
@@ -229,7 +230,7 @@ Making sure to open and close all the tags, but ignore any validation problems.
|
|
229
230
|
As of Hpricot 0.4, there's a new <tt>:fixup_tags</tt> option which will attempt
|
230
231
|
to shift the document's tags to meet XHTML 1.0 Strict.
|
231
232
|
|
232
|
-
|
233
|
+
doc = open("index.html") { |f| Hpricot f, :fixup_tags => true }
|
233
234
|
|
234
235
|
This doesn't quite meet the XHTML 1.0 Strict standard, it just tries to follow
|
235
236
|
the rules a bit better. Like: say Hpricot finds a paragraph in a link, it's
|
@@ -238,13 +239,13 @@ where paragraphs don't belong.
|
|
238
239
|
|
239
240
|
If an unknown element is found, it is ignored. Again, <tt>:fixup_tags</tt>.
|
240
241
|
|
241
|
-
|
242
|
+
## :xhtml_strict
|
242
243
|
|
243
244
|
So, let's go beyond just trying to fix the hierarchy. The
|
244
245
|
<tt>:xhtml_strict</tt> option really tries to force the document to be an XHTML
|
245
246
|
1.0 Strict document. Even at the cost of removing elements that get in the way.
|
246
247
|
|
247
|
-
|
248
|
+
doc = open("index.html") { |f| Hpricot f, :xhtml_strict => true }
|
248
249
|
|
249
250
|
What measures does <tt>:xhtml_strict</tt> take?
|
250
251
|
|
@@ -254,7 +255,7 @@ What measures does <tt>:xhtml_strict</tt> take?
|
|
254
255
|
4. Remove illegal content.
|
255
256
|
5. Alter the doctype to XHTML 1.0 Strict.
|
256
257
|
|
257
|
-
|
258
|
+
## Hpricot.XML()
|
258
259
|
|
259
260
|
The last option is the <tt>:xml</tt> option, which makes some slight variations
|
260
261
|
on the standard mode. The main difference is that :xml mode won't try to output
|
@@ -266,9 +267,9 @@ to case, friends.
|
|
266
267
|
|
267
268
|
The primary way to use Hpricot's XML mode is to call the Hpricot.XML method:
|
268
269
|
|
269
|
-
|
270
|
-
|
271
|
-
|
270
|
+
doc = open("http://redhanded.hobix.com/index.xml") do |f|
|
271
|
+
Hpricot.XML(f)
|
272
|
+
end
|
272
273
|
|
273
274
|
*Also, :fixup_tags is canceled out by the :xml option.* This is because
|
274
275
|
:fixup_tags makes assumptions based how HTML is structured. Specifically, how
|
data/Rakefile
CHANGED
@@ -1,10 +1,12 @@
|
|
1
|
-
require 'rake'
|
2
1
|
require 'rake/clean'
|
3
2
|
require 'rake/gempackagetask'
|
4
3
|
require 'rake/rdoctask'
|
5
4
|
require 'rake/testtask'
|
6
|
-
|
7
|
-
|
5
|
+
begin
|
6
|
+
require 'rake/extensiontask'
|
7
|
+
rescue LoadError
|
8
|
+
abort "To build, please first gem install rake-compiler"
|
9
|
+
end
|
8
10
|
|
9
11
|
RbConfig = Config unless defined?(RbConfig)
|
10
12
|
|
@@ -12,13 +14,14 @@ NAME = "hpricot"
|
|
12
14
|
REV = (`#{ENV['GIT'] || "git"} rev-list HEAD`.split.length + 1).to_s
|
13
15
|
VERS = ENV['VERSION'] || "0.8" + (REV ? ".#{REV}" : "")
|
14
16
|
PKG = "#{NAME}-#{VERS}"
|
15
|
-
BIN = "*.{bundle,jar,so,o,obj,pdb,lib,def,exp,class}"
|
16
|
-
CLEAN.include ["
|
17
|
+
BIN = "*.{bundle,jar,so,o,obj,pdb,lib,def,exp,class,rbc}"
|
18
|
+
CLEAN.include ["#{BIN}", "ext/**/#{BIN}", "lib/**/#{BIN}", "test/**/#{BIN}",
|
17
19
|
'ext/fast_xs/Makefile', 'ext/hpricot_scan/Makefile',
|
18
|
-
'**/.*.sw?', '*.gem', '.config', 'pkg']
|
19
|
-
RDOC_OPTS = ['--quiet', '--title', 'The Hpricot Reference', '--main', 'README', '--inline-source']
|
20
|
-
PKG_FILES = %w(CHANGELOG COPYING README Rakefile) +
|
21
|
-
Dir.glob("{bin,doc,test,
|
20
|
+
'**/.*.sw?', '*.gem', '.config', 'pkg', 'lib/hpricot_scan.rb', 'lib/fast_xs.rb']
|
21
|
+
RDOC_OPTS = ['--quiet', '--title', 'The Hpricot Reference', '--main', 'README.md', '--inline-source']
|
22
|
+
PKG_FILES = %w(CHANGELOG COPYING README.md Rakefile) +
|
23
|
+
Dir.glob("{bin,doc,test,extras}/**/*") +
|
24
|
+
(Dir.glob("lib/**/*.rb") - %w(lib/hpricot_scan.rb lib/fast_xs.rb)) +
|
22
25
|
Dir.glob("ext/**/*.{h,java,c,rb,rl}") +
|
23
26
|
%w[ext/hpricot_scan/hpricot_scan.c ext/hpricot_scan/hpricot_css.c ext/hpricot_scan/HpricotScanService.java] # needed because they are generated later
|
24
27
|
RAGEL_C_CODE_GENERATION_STYLES = {
|
@@ -39,7 +42,7 @@ SPEC =
|
|
39
42
|
s.platform = Gem::Platform::RUBY
|
40
43
|
s.has_rdoc = true
|
41
44
|
s.rdoc_options += RDOC_OPTS
|
42
|
-
s.extra_rdoc_files = ["README", "CHANGELOG", "COPYING"]
|
45
|
+
s.extra_rdoc_files = ["README.md", "CHANGELOG", "COPYING"]
|
43
46
|
s.summary = "a swift, liberal HTML parser with a fantastic library"
|
44
47
|
s.description = s.summary
|
45
48
|
s.author = "why the lucky stiff"
|
@@ -52,26 +55,54 @@ SPEC =
|
|
52
55
|
s.bindir = "bin"
|
53
56
|
end
|
54
57
|
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
58
|
+
# FAT cross-compile
|
59
|
+
# Pass RUBY_CC_VERSION=1.8.7:1.9.2 when packaging for 1.8+1.9 mswin32 binaries
|
60
|
+
%w(hpricot_scan fast_xs).each do |target|
|
61
|
+
Rake::ExtensionTask.new(target, SPEC) do |ext|
|
62
|
+
ext.lib_dir = File.join('lib', target) if ENV['RUBY_CC_VERSION']
|
63
|
+
ext.cross_compile = true # enable cross compilation (requires cross compile toolchain)
|
64
|
+
ext.cross_platform = 'i386-mswin32' # forces the Windows platform instead of the default one
|
65
|
+
end
|
59
66
|
|
60
|
-
|
67
|
+
# HACK around 1.9.2 cross .def file creation
|
68
|
+
def_file = "tmp/i386-mswin32/#{target}/1.9.2/#{target}-i386-mingw32.def"
|
69
|
+
directory File.dirname(def_file)
|
70
|
+
file def_file => File.dirname(def_file) do |t|
|
71
|
+
File.open(t.name, "w") do |f|
|
72
|
+
f << "EXPORTS\nInit_#{target}\n"
|
73
|
+
end
|
74
|
+
end
|
75
|
+
|
76
|
+
task File.join(File.dirname(def_file), "Makefile") => def_file
|
77
|
+
# END HACK
|
78
|
+
file "lib/#{target}.rb" do |t|
|
79
|
+
File.open(t.name, "w") do |f|
|
80
|
+
f.puts %{require "#{target}/\#{RUBY_VERSION.sub(/\\.\\d+$/, '')}/#{target}"}
|
81
|
+
end
|
82
|
+
end
|
83
|
+
end
|
84
|
+
file 'ext/hpricot_scan/extconf.rb' => :ragel
|
85
|
+
|
86
|
+
desc "set environment variables to build and/or test with debug options"
|
87
|
+
task :debug do
|
88
|
+
ENV['CFLAGS'] ||= ""
|
89
|
+
ENV['CFLAGS'] += " -g -DDEBUG"
|
90
|
+
end
|
61
91
|
|
62
92
|
desc "Does a full compile, test run"
|
63
93
|
if defined?(JRUBY_VERSION)
|
64
|
-
task :default => [:compile_java, :test]
|
94
|
+
task :default => [:compile_java, :clean_fat_rb, :test]
|
65
95
|
else
|
66
|
-
task :default => [:compile, :test]
|
96
|
+
task :default => [:compile, :clean_fat_rb, :test]
|
67
97
|
end
|
68
98
|
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
task :release => [:package, :package_win32, :package_jruby]
|
99
|
+
task :clean_fat_rb do
|
100
|
+
rm_f "lib/hpricot_scan.rb"
|
101
|
+
rm_f "lib/fast_xs.rb"
|
102
|
+
end
|
74
103
|
|
104
|
+
desc "Packages up Hpricot for all platforms."
|
105
|
+
task :package => [:clean]
|
75
106
|
|
76
107
|
desc "Run all the tests"
|
77
108
|
Rake::TestTask.new do |t|
|
@@ -83,8 +114,8 @@ end
|
|
83
114
|
Rake::RDocTask.new do |rdoc|
|
84
115
|
rdoc.rdoc_dir = 'doc/rdoc'
|
85
116
|
rdoc.options += RDOC_OPTS
|
86
|
-
rdoc.main = "README"
|
87
|
-
rdoc.rdoc_files.add ['README', 'CHANGELOG', 'COPYING', 'lib/**/*.rb']
|
117
|
+
rdoc.main = "README.md"
|
118
|
+
rdoc.rdoc_files.add ['README.md', 'CHANGELOG', 'COPYING', 'lib/**/*.rb']
|
88
119
|
end
|
89
120
|
|
90
121
|
Rake::GemPackageTask.new(SPEC) do |p|
|
@@ -92,53 +123,32 @@ Rake::GemPackageTask.new(SPEC) do |p|
|
|
92
123
|
p.gem_spec = SPEC
|
93
124
|
end
|
94
125
|
|
95
|
-
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
-
|
100
|
-
"
|
101
|
-
|
102
|
-
"#{ext}/extconf.rb",
|
103
|
-
"#{ext}/Makefile",
|
104
|
-
"lib"
|
105
|
-
]
|
106
|
-
|
107
|
-
desc "Builds just the #{extension} extension"
|
108
|
-
task extension.to_sym => [:ragel, "#{ext}/Makefile", ext_so ]
|
109
|
-
|
110
|
-
file "#{ext}/Makefile" => ["#{ext}/extconf.rb"] do
|
111
|
-
Dir.chdir(ext) do ruby "extconf.rb" end
|
112
|
-
end
|
113
|
-
|
114
|
-
file ext_so => ext_files do
|
115
|
-
Dir.chdir(ext) do
|
116
|
-
sh(RUBY_PLATFORM =~ /mswin/ ? 'nmake' : 'make')
|
126
|
+
### Win32 Packages ###
|
127
|
+
Win32Spec = SPEC.dup
|
128
|
+
Win32Spec.platform = 'i386-mswin32'
|
129
|
+
Win32Spec.files = PKG_FILES + %w(hpricot_scan fast_xs).map do |t|
|
130
|
+
unless ENV['RUBY_CC_VERSION']
|
131
|
+
file "lib/#{t}/1.8/#{t}.so" do
|
132
|
+
abort "ERROR while packaging: re-run for fat win32 gems:\nrake #{ARGV.join(' ')} RUBY_CC_VERSION=1.8.7:1.9.2"
|
117
133
|
end
|
118
|
-
cp ext_so, "lib"
|
119
134
|
end
|
135
|
+
["lib/#{t}.rb", "lib/#{t}/1.8/#{t}.so", "lib/#{t}/1.9/#{t}.so"]
|
136
|
+
end.flatten
|
137
|
+
Win32Spec.extensions = []
|
120
138
|
|
121
|
-
|
122
|
-
|
123
|
-
|
124
|
-
sh "cd #{WIN32_PKG_DIR}/ext/#{extension}/ && ruby -I. extconf.rb && make"
|
125
|
-
mv "#{WIN32_PKG_DIR}/ext/#{extension}/#{extension}.so", "#{WIN32_PKG_DIR}/lib"
|
126
|
-
end
|
139
|
+
Rake::GemPackageTask.new(Win32Spec) do |p|
|
140
|
+
p.need_tar = false
|
141
|
+
p.gem_spec = Win32Spec
|
127
142
|
end
|
128
143
|
|
129
|
-
|
130
|
-
|
131
|
-
|
144
|
+
JRubySpec = SPEC.dup
|
145
|
+
JRubySpec.platform = 'java'
|
146
|
+
JRubySpec.files = PKG_FILES + ["lib/hpricot_scan.jar", "lib/fast_xs.jar"]
|
147
|
+
JRubySpec.extensions = []
|
132
148
|
|
133
|
-
|
134
|
-
|
135
|
-
|
136
|
-
STDERR.puts "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
|
137
|
-
STDERR.puts "Gem actually failed to build. Your system is"
|
138
|
-
STDERR.puts "NOT configured properly to build hpricot."
|
139
|
-
STDERR.puts "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
|
140
|
-
exit(1)
|
141
|
-
end
|
149
|
+
Rake::GemPackageTask.new(JRubySpec) do |p|
|
150
|
+
p.need_tar = false
|
151
|
+
p.gem_spec = JRubySpec
|
142
152
|
end
|
143
153
|
|
144
154
|
desc "Determines the Ragel version and displays it on the console along with the location of the Ragel binary."
|
@@ -178,27 +188,7 @@ task :ragel_java => [:ragel_version] do
|
|
178
188
|
end
|
179
189
|
end
|
180
190
|
|
181
|
-
###
|
182
|
-
|
183
|
-
desc "Package up the Win32 distribution."
|
184
|
-
file WIN32_PKG_DIR => [:package] do
|
185
|
-
sh "tar zxf pkg/#{PKG}.tgz"
|
186
|
-
mv PKG, WIN32_PKG_DIR
|
187
|
-
end
|
188
|
-
|
189
|
-
desc "Build the binary RubyGems package for win32"
|
190
|
-
task :package_win32 => ["fast_xs_win32", "hpricot_scan_win32"] do
|
191
|
-
Dir.chdir("#{WIN32_PKG_DIR}") do
|
192
|
-
Gem::Builder.new(Win32Spec).build
|
193
|
-
verbose(true) {
|
194
|
-
mv Dir["*.gem"].first, "../pkg/"
|
195
|
-
}
|
196
|
-
end
|
197
|
-
end
|
198
|
-
|
199
|
-
CLEAN.include WIN32_PKG_DIR
|
200
|
-
|
201
|
-
### JRuby Packages ###
|
191
|
+
### JRuby Compile ###
|
202
192
|
|
203
193
|
def java_classpath_arg # myriad of ways to discover JRuby classpath
|
204
194
|
begin
|
@@ -211,7 +201,11 @@ def java_classpath_arg # myriad of ways to discover JRuby classpath
|
|
211
201
|
jruby_cpath = ENV['JRUBY_PARENT_CLASSPATH'] || ENV['JRUBY_HOME'] &&
|
212
202
|
FileList["#{ENV['JRUBY_HOME']}/lib/*.jar"].join(File::PATH_SEPARATOR)
|
213
203
|
end
|
214
|
-
jruby_cpath
|
204
|
+
unless jruby_cpath || ENV['CLASSPATH'] =~ /jruby/
|
205
|
+
abort %{WARNING: No JRuby classpath has been set up.
|
206
|
+
Define JRUBY_HOME=/path/to/jruby on the command line or in the environment}
|
207
|
+
end
|
208
|
+
"-cp \"#{jruby_cpath}\""
|
215
209
|
end
|
216
210
|
|
217
211
|
def compile_java(filenames, jarname)
|
@@ -231,42 +225,10 @@ task :fast_xs_java do
|
|
231
225
|
end
|
232
226
|
end
|
233
227
|
|
234
|
-
|
235
|
-
|
236
|
-
|
237
|
-
end
|
238
|
-
|
239
|
-
JRubySpec = SPEC.dup
|
240
|
-
JRubySpec.platform = 'java'
|
241
|
-
JRubySpec.files = PKG_FILES + ["lib/hpricot_scan.jar", "lib/fast_xs.jar"]
|
242
|
-
JRubySpec.extensions = []
|
243
|
-
|
244
|
-
JRUBY_PKG_DIR = "#{PKG}-java"
|
245
|
-
|
246
|
-
desc "Package up the JRuby distribution."
|
247
|
-
file JRUBY_PKG_DIR => [:ragel_java, :package] do
|
248
|
-
sh "tar zxf pkg/#{PKG}.tgz"
|
249
|
-
mv PKG, JRUBY_PKG_DIR
|
250
|
-
end
|
251
|
-
|
252
|
-
desc "Build the RubyGems package for JRuby"
|
253
|
-
task :package_jruby => JRUBY_PKG_DIR do
|
254
|
-
Dir.chdir("#{JRUBY_PKG_DIR}") do
|
255
|
-
Rake::Task[:compile_java].invoke
|
256
|
-
Gem::Builder.new(JRubySpec).build
|
257
|
-
verbose(true) {
|
258
|
-
mv Dir["*.gem"].first, "../pkg/#{JRUBY_PKG_DIR}.gem"
|
259
|
-
}
|
228
|
+
%w(hpricot_scan fast_xs).each do |ext|
|
229
|
+
file "lib/#{ext}.jar" => "#{ext}_java" do |t|
|
230
|
+
mv "ext/#{ext}/#{ext}.jar", "lib"
|
260
231
|
end
|
232
|
+
task :compile_java => "lib/#{ext}.jar"
|
261
233
|
end
|
262
234
|
|
263
|
-
CLEAN.include JRUBY_PKG_DIR
|
264
|
-
|
265
|
-
task :install do
|
266
|
-
sh %{rake package}
|
267
|
-
sh %{sudo gem install pkg/#{NAME}-#{VERS}}
|
268
|
-
end
|
269
|
-
|
270
|
-
task :uninstall => [:clean] do
|
271
|
-
sh %{sudo gem uninstall #{NAME}}
|
272
|
-
end
|