hpricot 0.8.2 → 0.8.3
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGELOG +19 -3
- data/{README → README.md} +84 -83
- data/Rakefile +85 -123
- data/ext/fast_xs/FastXsService.java +94 -1
- data/ext/fast_xs/fast_xs.c +60 -51
- data/ext/hpricot_scan/HpricotCss.java +50 -31
- data/ext/hpricot_scan/HpricotScanService.java +77 -64
- data/ext/hpricot_scan/extconf.rb +3 -0
- data/ext/hpricot_scan/hpricot_css.c +331 -323
- data/ext/hpricot_scan/hpricot_css.rl +6 -1
- data/ext/hpricot_scan/hpricot_scan.c +922 -810
- data/ext/hpricot_scan/hpricot_scan.java.rl +22 -13
- data/ext/hpricot_scan/hpricot_scan.rl +198 -90
- data/extras/hpricot.png +0 -0
- data/lib/hpricot/builder.rb +1 -1
- data/lib/hpricot/elements.rb +28 -24
- data/lib/hpricot/htmlinfo.rb +1 -1
- data/lib/hpricot/tag.rb +2 -2
- data/lib/hpricot/tags.rb +4 -4
- data/lib/hpricot/traverse.rb +25 -25
- data/lib/hpricot/xchar.rb +3 -3
- data/test/files/basic.xhtml +1 -1
- data/test/test_parser.rb +31 -2
- metadata +21 -9
- data/extras/mingw-rbconfig.rb +0 -176
data/CHANGELOG
CHANGED
@@ -1,16 +1,32 @@
|
|
1
|
+
= 0.8.3
|
2
|
+
=== 3 November, 2010
|
3
|
+
* GH#8: Nil-check before downcasing attribute key
|
4
|
+
* GH#25: Proper ruby 1.9 encoding support
|
5
|
+
* GH#28. Use integers instead of ?? on 1.9, which is just a string.
|
6
|
+
* including noscript to ElementInclusions , so that hpricot wont fail
|
7
|
+
when trying to parse a meta tag inside head section when noscript is
|
8
|
+
present.
|
9
|
+
* latest changes from fast_xs mainline
|
10
|
+
* Fixes to get Hpricot running on Rubinius:
|
11
|
+
* Use free, not XFREE
|
12
|
+
* Remove RSTRUCT craziness, don't break Array#at
|
13
|
+
|
1
14
|
= 0.8.2
|
2
15
|
=== 5 November, 2009
|
3
16
|
* Bring JRuby support up to speed, including Java-based hpricot_css support
|
4
17
|
* Change JRuby fast_xs to have same escaping behavior as C fast_xs
|
5
|
-
|
6
|
-
= 0.8.1
|
7
|
-
=== 3 April, 2009
|
8
18
|
* fix for issue #2, downcasing of html attributes inside the parser.
|
9
19
|
* solve issue #3 with bogus etags being preserved in `to_s` rather than just `to_original_html`.
|
10
20
|
* fix error when attempting to reparent cleared node. (issue #5)
|
11
21
|
* Hpricot::Attributes proxy object for using `ele.attributes[k] = v` directly.
|
12
22
|
however, it is preferred to use the jquery-like `elements.attr(k, v)`.
|
13
23
|
|
24
|
+
= 0.8.1
|
25
|
+
=== 3 April, 2009
|
26
|
+
* big problems on Ruby 1.8.6, use INT2FIX instead of INT2NUM. hashes were being cast to bignums.
|
27
|
+
* patch for 1.8.5 to define RARRAY_PTR. thanks, mike perham!
|
28
|
+
* inspecting empty document bug, courtesy of @TalLevAmi.
|
29
|
+
|
14
30
|
= 0.8
|
15
31
|
=== 31st March, 2009
|
16
32
|
* Saving memory and speed by using RStruct-based elements in the C extension.
|
data/{README → README.md}
RENAMED
@@ -1,4 +1,4 @@
|
|
1
|
-
|
1
|
+
# Hpricot, Read Any HTML
|
2
2
|
|
3
3
|
Hpricot is a fast, flexible HTML parser written in C. It's designed to be very
|
4
4
|
accommodating (like Tanaka Akira's HTree) and to have a very helpful library
|
@@ -13,21 +13,21 @@ thing.
|
|
13
13
|
*Please read this entire document* before making assumptions about how this
|
14
14
|
software works.
|
15
15
|
|
16
|
-
|
16
|
+
## An Overview
|
17
17
|
|
18
18
|
Let's clear up what Hpricot is.
|
19
19
|
|
20
|
-
|
21
|
-
|
20
|
+
* Hpricot is *a standalone library*. It requires no other libraries. Just Ruby!
|
21
|
+
* While priding itself on speed, Hpricot *works hard to sort out bad HTML* and
|
22
22
|
pays a small penalty in order to get that right. So that's slightly more important
|
23
23
|
to me than speed.
|
24
|
-
|
24
|
+
* *If you can see it in Firefox, then Hpricot should parse it.* That's
|
25
25
|
how it should be! Let me know the minute it's otherwise.
|
26
|
-
|
26
|
+
* Primarily, Hpricot is used for reading HTML and tries to sort out troubled
|
27
27
|
HTML by having some idea of what good HTML is. Some people still like to use
|
28
28
|
Hpricot for XML reading, but *remember to use the Hpricot::XML() method* for that!
|
29
29
|
|
30
|
-
|
30
|
+
## The Hpricot Kingdom
|
31
31
|
|
32
32
|
First, here are all the links you need to know:
|
33
33
|
|
@@ -43,57 +43,57 @@ not going to say "Use at your own risk" because I don't want this library to be
|
|
43
43
|
risky. If you trip on something, I'll share the liability by repairing things
|
44
44
|
as quickly as I can. Your responsibility is to report the inadequacies.
|
45
45
|
|
46
|
-
|
46
|
+
## Installing Hpricot
|
47
47
|
|
48
48
|
You may get the latest stable version from Rubyforge. Win32 binaries,
|
49
49
|
Java binaries (for JRuby), and source gems are available.
|
50
50
|
|
51
|
-
|
51
|
+
$ gem install hpricot
|
52
52
|
|
53
|
-
|
53
|
+
## An Hpricot Showcase
|
54
54
|
|
55
55
|
We're going to run through a big pile of examples to get you jump-started.
|
56
56
|
Many of these examples are also found at
|
57
57
|
http://wiki.github.com/hpricot/hpricot/hpricot-basics, in case you
|
58
58
|
want to add some of your own.
|
59
59
|
|
60
|
-
|
60
|
+
### Loading Hpricot Itself
|
61
61
|
|
62
62
|
You have probably got the gem, right? To load Hpricot:
|
63
63
|
|
64
|
-
|
65
|
-
|
64
|
+
require 'rubygems'
|
65
|
+
require 'hpricot'
|
66
66
|
|
67
67
|
If you've installed the plain source distribution, go ahead and just:
|
68
68
|
|
69
|
-
|
69
|
+
require 'hpricot'
|
70
70
|
|
71
|
-
|
71
|
+
### Load an HTML Page
|
72
72
|
|
73
73
|
The <tt>Hpricot()</tt> method takes a string or any IO object and loads the
|
74
74
|
contents into a document object.
|
75
75
|
|
76
|
-
|
76
|
+
doc = Hpricot("<p>A simple <b>test</b> string.</p>")
|
77
77
|
|
78
78
|
To load from a file, just get the stream open:
|
79
79
|
|
80
|
-
|
80
|
+
doc = open("index.html") { |f| Hpricot(f) }
|
81
81
|
|
82
82
|
To load from a web URL, use <tt>open-uri</tt>, which comes with Ruby:
|
83
83
|
|
84
|
-
|
85
|
-
|
84
|
+
require 'open-uri'
|
85
|
+
doc = open("http://qwantz.com/") { |f| Hpricot(f) }
|
86
86
|
|
87
87
|
Hpricot uses an internal buffer to parse the file, so the IO will stream
|
88
88
|
properly and large documents won't be loaded into memory all at once. However,
|
89
89
|
the parsed document object will be present in memory, in its entirety.
|
90
90
|
|
91
|
-
|
91
|
+
### Search for Elements
|
92
92
|
|
93
93
|
Use <tt>Doc.search</tt>:
|
94
94
|
|
95
|
-
|
96
|
-
|
95
|
+
doc.search("//p[@class='posted']")
|
96
|
+
#=> #<Hpricot:Elements[{p ...}, {p ...}]>
|
97
97
|
|
98
98
|
<tt>Doc.search</tt> can take an XPath or CSS expression. In the above example,
|
99
99
|
all paragraph <tt><p></tt> elements are grabbed which have a <tt>class</tt>
|
@@ -101,126 +101,127 @@ attribute of <tt>"posted"</tt>.
|
|
101
101
|
|
102
102
|
A shortcut is to use the divisor:
|
103
103
|
|
104
|
-
|
105
|
-
|
104
|
+
(doc/"p.posted")
|
105
|
+
#=> #<Hpricot:Elements[{p ...}, {p ...}]>
|
106
106
|
|
107
|
-
|
107
|
+
### Finding Just One Element
|
108
108
|
|
109
109
|
If you're looking for a single element, the <tt>at</tt> method will return the
|
110
110
|
first element matched by the expression. In this case, you'll get back the
|
111
111
|
element itself rather than the <tt>Hpricot::Elements</tt> array.
|
112
112
|
|
113
|
-
|
113
|
+
doc.at("body")['onload']
|
114
114
|
|
115
115
|
The above code will find the body tag and give you back the <tt>onload</tt>
|
116
116
|
attribute. This is the most common reason to use the element directly: when
|
117
117
|
reading and writing HTML attributes.
|
118
118
|
|
119
|
-
|
119
|
+
### Fetching the Contents of an Element
|
120
120
|
|
121
121
|
Just as with browser scripting, the <tt>inner_html</tt> property can be used to
|
122
122
|
get the inner contents of an element.
|
123
123
|
|
124
|
-
|
125
|
-
|
124
|
+
(doc/"#elementID").inner_html
|
125
|
+
#=> "..contents.."
|
126
126
|
|
127
127
|
If your expression matches more than one element, you'll get back the contents
|
128
128
|
of ''all the matched elements''. So you may want to use <tt>first</tt> to be
|
129
129
|
sure you get back only one.
|
130
130
|
|
131
|
-
|
132
|
-
|
131
|
+
(doc/"#elementID").first.inner_html
|
132
|
+
#=> "..contents.."
|
133
133
|
|
134
|
-
|
134
|
+
### Fetching the HTML for an Element
|
135
135
|
|
136
136
|
If you want the HTML for the whole element (not just the contents), use
|
137
137
|
<tt>to_html</tt>:
|
138
138
|
|
139
|
-
|
140
|
-
|
139
|
+
(doc/"#elementID").to_html
|
140
|
+
#=> "<div id='elementID'>...</div>"
|
141
141
|
|
142
|
-
|
142
|
+
### Looping
|
143
143
|
|
144
144
|
All searches return a set of <tt>Hpricot::Elements</tt>. Go ahead and loop
|
145
145
|
through them like you would an array.
|
146
146
|
|
147
|
-
|
148
|
-
|
149
|
-
|
147
|
+
(doc/"p/a/img").each do |img|
|
148
|
+
puts img.attributes['class']
|
149
|
+
end
|
150
150
|
|
151
|
-
|
151
|
+
### Continuing Searches
|
152
152
|
|
153
153
|
Searches can be continued from a collection of elements, in order to search deeper.
|
154
154
|
|
155
|
-
|
156
|
-
|
157
|
-
|
158
|
-
|
159
|
-
|
155
|
+
# find all paragraphs.
|
156
|
+
elements = doc.search("/html/body//p")
|
157
|
+
# continue the search by finding any images within those paragraphs.
|
158
|
+
(elements/"img")
|
159
|
+
#=> #<Hpricot::Elements[{img ...}, {img ...}]>
|
160
160
|
|
161
161
|
Searches can also be continued by searching within container elements.
|
162
162
|
|
163
|
-
|
164
|
-
|
165
|
-
|
166
|
-
|
163
|
+
# find all images within paragraphs.
|
164
|
+
doc.search("/html/body//p").each do |para|
|
165
|
+
puts "== Found a paragraph =="
|
166
|
+
pp para
|
167
167
|
|
168
|
-
|
169
|
-
|
170
|
-
|
171
|
-
|
172
|
-
|
168
|
+
imgs = para.search("img")
|
169
|
+
if imgs.any?
|
170
|
+
puts "== Found #{imgs.length} images inside =="
|
171
|
+
end
|
172
|
+
end
|
173
173
|
|
174
174
|
Of course, the most succinct ways to do the above are using CSS or XPath.
|
175
175
|
|
176
|
-
|
177
|
-
|
178
|
-
|
179
|
-
|
180
|
-
|
181
|
-
|
176
|
+
# the xpath version
|
177
|
+
(doc/"/html/body//p//img")
|
178
|
+
# the css version
|
179
|
+
(doc/"html > body > p img")
|
180
|
+
# ..or symbols work, too!
|
181
|
+
(doc/:html/:body/:p/:img)
|
182
182
|
|
183
|
-
|
183
|
+
### Looping Edits
|
184
184
|
|
185
185
|
You may certainly edit objects from within your search loops. Then, when you
|
186
186
|
spit out the HTML, the altered elements will show.
|
187
187
|
|
188
|
-
|
189
|
-
|
190
|
-
|
191
|
-
|
188
|
+
|
189
|
+
(doc/"span.entryPermalink").each do |span|
|
190
|
+
span.attributes['class'] = 'newLinks'
|
191
|
+
end
|
192
|
+
puts doc
|
192
193
|
|
193
194
|
This changes all <tt>span.entryPermalink</tt> elements to
|
194
195
|
<tt>span.newLinks</tt>. Keep in mind that there are often more convenient ways
|
195
196
|
of doing this. Such as the <tt>set</tt> method:
|
196
197
|
|
197
|
-
|
198
|
+
(doc/"span.entryPermalink").set(:class => 'newLinks')
|
198
199
|
|
199
|
-
|
200
|
+
### Figuring Out Paths
|
200
201
|
|
201
202
|
Every element can tell you its unique path (either XPath or CSS) to get to the
|
202
203
|
element from the root tag.
|
203
204
|
|
204
205
|
The <tt>css_path</tt> method:
|
205
206
|
|
206
|
-
|
207
|
-
|
208
|
-
|
209
|
-
|
207
|
+
doc.at("div > div:nth(1)").css_path
|
208
|
+
#=> "div > div:nth(1)"
|
209
|
+
doc.at("#header").css_path
|
210
|
+
#=> "#header"
|
210
211
|
|
211
212
|
Or, the <tt>xpath</tt> method:
|
212
213
|
|
213
|
-
|
214
|
-
|
215
|
-
|
216
|
-
|
214
|
+
doc.at("div > div:nth(1)").xpath
|
215
|
+
#=> "/div/div:eq(1)"
|
216
|
+
doc.at("#header").xpath
|
217
|
+
#=> "//div[@id='header']"
|
217
218
|
|
218
|
-
|
219
|
+
## Hpricot Fixups
|
219
220
|
|
220
221
|
When loading HTML documents, you have a few settings that can make Hpricot more
|
221
222
|
or less intense about how it gets involved.
|
222
223
|
|
223
|
-
|
224
|
+
## :fixup_tags
|
224
225
|
|
225
226
|
Really, there are so many ways to clean up HTML and your intentions may be to
|
226
227
|
keep the HTML as-is. So Hpricot's default behavior is to keep things flexible.
|
@@ -229,7 +230,7 @@ Making sure to open and close all the tags, but ignore any validation problems.
|
|
229
230
|
As of Hpricot 0.4, there's a new <tt>:fixup_tags</tt> option which will attempt
|
230
231
|
to shift the document's tags to meet XHTML 1.0 Strict.
|
231
232
|
|
232
|
-
|
233
|
+
doc = open("index.html") { |f| Hpricot f, :fixup_tags => true }
|
233
234
|
|
234
235
|
This doesn't quite meet the XHTML 1.0 Strict standard, it just tries to follow
|
235
236
|
the rules a bit better. Like: say Hpricot finds a paragraph in a link, it's
|
@@ -238,13 +239,13 @@ where paragraphs don't belong.
|
|
238
239
|
|
239
240
|
If an unknown element is found, it is ignored. Again, <tt>:fixup_tags</tt>.
|
240
241
|
|
241
|
-
|
242
|
+
## :xhtml_strict
|
242
243
|
|
243
244
|
So, let's go beyond just trying to fix the hierarchy. The
|
244
245
|
<tt>:xhtml_strict</tt> option really tries to force the document to be an XHTML
|
245
246
|
1.0 Strict document. Even at the cost of removing elements that get in the way.
|
246
247
|
|
247
|
-
|
248
|
+
doc = open("index.html") { |f| Hpricot f, :xhtml_strict => true }
|
248
249
|
|
249
250
|
What measures does <tt>:xhtml_strict</tt> take?
|
250
251
|
|
@@ -254,7 +255,7 @@ What measures does <tt>:xhtml_strict</tt> take?
|
|
254
255
|
4. Remove illegal content.
|
255
256
|
5. Alter the doctype to XHTML 1.0 Strict.
|
256
257
|
|
257
|
-
|
258
|
+
## Hpricot.XML()
|
258
259
|
|
259
260
|
The last option is the <tt>:xml</tt> option, which makes some slight variations
|
260
261
|
on the standard mode. The main difference is that :xml mode won't try to output
|
@@ -266,9 +267,9 @@ to case, friends.
|
|
266
267
|
|
267
268
|
The primary way to use Hpricot's XML mode is to call the Hpricot.XML method:
|
268
269
|
|
269
|
-
|
270
|
-
|
271
|
-
|
270
|
+
doc = open("http://redhanded.hobix.com/index.xml") do |f|
|
271
|
+
Hpricot.XML(f)
|
272
|
+
end
|
272
273
|
|
273
274
|
*Also, :fixup_tags is canceled out by the :xml option.* This is because
|
274
275
|
:fixup_tags makes assumptions based how HTML is structured. Specifically, how
|
data/Rakefile
CHANGED
@@ -1,10 +1,12 @@
|
|
1
|
-
require 'rake'
|
2
1
|
require 'rake/clean'
|
3
2
|
require 'rake/gempackagetask'
|
4
3
|
require 'rake/rdoctask'
|
5
4
|
require 'rake/testtask'
|
6
|
-
|
7
|
-
|
5
|
+
begin
|
6
|
+
require 'rake/extensiontask'
|
7
|
+
rescue LoadError
|
8
|
+
abort "To build, please first gem install rake-compiler"
|
9
|
+
end
|
8
10
|
|
9
11
|
RbConfig = Config unless defined?(RbConfig)
|
10
12
|
|
@@ -12,13 +14,14 @@ NAME = "hpricot"
|
|
12
14
|
REV = (`#{ENV['GIT'] || "git"} rev-list HEAD`.split.length + 1).to_s
|
13
15
|
VERS = ENV['VERSION'] || "0.8" + (REV ? ".#{REV}" : "")
|
14
16
|
PKG = "#{NAME}-#{VERS}"
|
15
|
-
BIN = "*.{bundle,jar,so,o,obj,pdb,lib,def,exp,class}"
|
16
|
-
CLEAN.include ["
|
17
|
+
BIN = "*.{bundle,jar,so,o,obj,pdb,lib,def,exp,class,rbc}"
|
18
|
+
CLEAN.include ["#{BIN}", "ext/**/#{BIN}", "lib/**/#{BIN}", "test/**/#{BIN}",
|
17
19
|
'ext/fast_xs/Makefile', 'ext/hpricot_scan/Makefile',
|
18
|
-
'**/.*.sw?', '*.gem', '.config', 'pkg']
|
19
|
-
RDOC_OPTS = ['--quiet', '--title', 'The Hpricot Reference', '--main', 'README', '--inline-source']
|
20
|
-
PKG_FILES = %w(CHANGELOG COPYING README Rakefile) +
|
21
|
-
Dir.glob("{bin,doc,test,
|
20
|
+
'**/.*.sw?', '*.gem', '.config', 'pkg', 'lib/hpricot_scan.rb', 'lib/fast_xs.rb']
|
21
|
+
RDOC_OPTS = ['--quiet', '--title', 'The Hpricot Reference', '--main', 'README.md', '--inline-source']
|
22
|
+
PKG_FILES = %w(CHANGELOG COPYING README.md Rakefile) +
|
23
|
+
Dir.glob("{bin,doc,test,extras}/**/*") +
|
24
|
+
(Dir.glob("lib/**/*.rb") - %w(lib/hpricot_scan.rb lib/fast_xs.rb)) +
|
22
25
|
Dir.glob("ext/**/*.{h,java,c,rb,rl}") +
|
23
26
|
%w[ext/hpricot_scan/hpricot_scan.c ext/hpricot_scan/hpricot_css.c ext/hpricot_scan/HpricotScanService.java] # needed because they are generated later
|
24
27
|
RAGEL_C_CODE_GENERATION_STYLES = {
|
@@ -39,7 +42,7 @@ SPEC =
|
|
39
42
|
s.platform = Gem::Platform::RUBY
|
40
43
|
s.has_rdoc = true
|
41
44
|
s.rdoc_options += RDOC_OPTS
|
42
|
-
s.extra_rdoc_files = ["README", "CHANGELOG", "COPYING"]
|
45
|
+
s.extra_rdoc_files = ["README.md", "CHANGELOG", "COPYING"]
|
43
46
|
s.summary = "a swift, liberal HTML parser with a fantastic library"
|
44
47
|
s.description = s.summary
|
45
48
|
s.author = "why the lucky stiff"
|
@@ -52,26 +55,54 @@ SPEC =
|
|
52
55
|
s.bindir = "bin"
|
53
56
|
end
|
54
57
|
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
58
|
+
# FAT cross-compile
|
59
|
+
# Pass RUBY_CC_VERSION=1.8.7:1.9.2 when packaging for 1.8+1.9 mswin32 binaries
|
60
|
+
%w(hpricot_scan fast_xs).each do |target|
|
61
|
+
Rake::ExtensionTask.new(target, SPEC) do |ext|
|
62
|
+
ext.lib_dir = File.join('lib', target) if ENV['RUBY_CC_VERSION']
|
63
|
+
ext.cross_compile = true # enable cross compilation (requires cross compile toolchain)
|
64
|
+
ext.cross_platform = 'i386-mswin32' # forces the Windows platform instead of the default one
|
65
|
+
end
|
59
66
|
|
60
|
-
|
67
|
+
# HACK around 1.9.2 cross .def file creation
|
68
|
+
def_file = "tmp/i386-mswin32/#{target}/1.9.2/#{target}-i386-mingw32.def"
|
69
|
+
directory File.dirname(def_file)
|
70
|
+
file def_file => File.dirname(def_file) do |t|
|
71
|
+
File.open(t.name, "w") do |f|
|
72
|
+
f << "EXPORTS\nInit_#{target}\n"
|
73
|
+
end
|
74
|
+
end
|
75
|
+
|
76
|
+
task File.join(File.dirname(def_file), "Makefile") => def_file
|
77
|
+
# END HACK
|
78
|
+
file "lib/#{target}.rb" do |t|
|
79
|
+
File.open(t.name, "w") do |f|
|
80
|
+
f.puts %{require "#{target}/\#{RUBY_VERSION.sub(/\\.\\d+$/, '')}/#{target}"}
|
81
|
+
end
|
82
|
+
end
|
83
|
+
end
|
84
|
+
file 'ext/hpricot_scan/extconf.rb' => :ragel
|
85
|
+
|
86
|
+
desc "set environment variables to build and/or test with debug options"
|
87
|
+
task :debug do
|
88
|
+
ENV['CFLAGS'] ||= ""
|
89
|
+
ENV['CFLAGS'] += " -g -DDEBUG"
|
90
|
+
end
|
61
91
|
|
62
92
|
desc "Does a full compile, test run"
|
63
93
|
if defined?(JRUBY_VERSION)
|
64
|
-
task :default => [:compile_java, :test]
|
94
|
+
task :default => [:compile_java, :clean_fat_rb, :test]
|
65
95
|
else
|
66
|
-
task :default => [:compile, :test]
|
96
|
+
task :default => [:compile, :clean_fat_rb, :test]
|
67
97
|
end
|
68
98
|
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
task :release => [:package, :package_win32, :package_jruby]
|
99
|
+
task :clean_fat_rb do
|
100
|
+
rm_f "lib/hpricot_scan.rb"
|
101
|
+
rm_f "lib/fast_xs.rb"
|
102
|
+
end
|
74
103
|
|
104
|
+
desc "Packages up Hpricot for all platforms."
|
105
|
+
task :package => [:clean]
|
75
106
|
|
76
107
|
desc "Run all the tests"
|
77
108
|
Rake::TestTask.new do |t|
|
@@ -83,8 +114,8 @@ end
|
|
83
114
|
Rake::RDocTask.new do |rdoc|
|
84
115
|
rdoc.rdoc_dir = 'doc/rdoc'
|
85
116
|
rdoc.options += RDOC_OPTS
|
86
|
-
rdoc.main = "README"
|
87
|
-
rdoc.rdoc_files.add ['README', 'CHANGELOG', 'COPYING', 'lib/**/*.rb']
|
117
|
+
rdoc.main = "README.md"
|
118
|
+
rdoc.rdoc_files.add ['README.md', 'CHANGELOG', 'COPYING', 'lib/**/*.rb']
|
88
119
|
end
|
89
120
|
|
90
121
|
Rake::GemPackageTask.new(SPEC) do |p|
|
@@ -92,53 +123,32 @@ Rake::GemPackageTask.new(SPEC) do |p|
|
|
92
123
|
p.gem_spec = SPEC
|
93
124
|
end
|
94
125
|
|
95
|
-
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
-
|
100
|
-
"
|
101
|
-
|
102
|
-
"#{ext}/extconf.rb",
|
103
|
-
"#{ext}/Makefile",
|
104
|
-
"lib"
|
105
|
-
]
|
106
|
-
|
107
|
-
desc "Builds just the #{extension} extension"
|
108
|
-
task extension.to_sym => [:ragel, "#{ext}/Makefile", ext_so ]
|
109
|
-
|
110
|
-
file "#{ext}/Makefile" => ["#{ext}/extconf.rb"] do
|
111
|
-
Dir.chdir(ext) do ruby "extconf.rb" end
|
112
|
-
end
|
113
|
-
|
114
|
-
file ext_so => ext_files do
|
115
|
-
Dir.chdir(ext) do
|
116
|
-
sh(RUBY_PLATFORM =~ /mswin/ ? 'nmake' : 'make')
|
126
|
+
### Win32 Packages ###
|
127
|
+
Win32Spec = SPEC.dup
|
128
|
+
Win32Spec.platform = 'i386-mswin32'
|
129
|
+
Win32Spec.files = PKG_FILES + %w(hpricot_scan fast_xs).map do |t|
|
130
|
+
unless ENV['RUBY_CC_VERSION']
|
131
|
+
file "lib/#{t}/1.8/#{t}.so" do
|
132
|
+
abort "ERROR while packaging: re-run for fat win32 gems:\nrake #{ARGV.join(' ')} RUBY_CC_VERSION=1.8.7:1.9.2"
|
117
133
|
end
|
118
|
-
cp ext_so, "lib"
|
119
134
|
end
|
135
|
+
["lib/#{t}.rb", "lib/#{t}/1.8/#{t}.so", "lib/#{t}/1.9/#{t}.so"]
|
136
|
+
end.flatten
|
137
|
+
Win32Spec.extensions = []
|
120
138
|
|
121
|
-
|
122
|
-
|
123
|
-
|
124
|
-
sh "cd #{WIN32_PKG_DIR}/ext/#{extension}/ && ruby -I. extconf.rb && make"
|
125
|
-
mv "#{WIN32_PKG_DIR}/ext/#{extension}/#{extension}.so", "#{WIN32_PKG_DIR}/lib"
|
126
|
-
end
|
139
|
+
Rake::GemPackageTask.new(Win32Spec) do |p|
|
140
|
+
p.need_tar = false
|
141
|
+
p.gem_spec = Win32Spec
|
127
142
|
end
|
128
143
|
|
129
|
-
|
130
|
-
|
131
|
-
|
144
|
+
JRubySpec = SPEC.dup
|
145
|
+
JRubySpec.platform = 'java'
|
146
|
+
JRubySpec.files = PKG_FILES + ["lib/hpricot_scan.jar", "lib/fast_xs.jar"]
|
147
|
+
JRubySpec.extensions = []
|
132
148
|
|
133
|
-
|
134
|
-
|
135
|
-
|
136
|
-
STDERR.puts "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
|
137
|
-
STDERR.puts "Gem actually failed to build. Your system is"
|
138
|
-
STDERR.puts "NOT configured properly to build hpricot."
|
139
|
-
STDERR.puts "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
|
140
|
-
exit(1)
|
141
|
-
end
|
149
|
+
Rake::GemPackageTask.new(JRubySpec) do |p|
|
150
|
+
p.need_tar = false
|
151
|
+
p.gem_spec = JRubySpec
|
142
152
|
end
|
143
153
|
|
144
154
|
desc "Determines the Ragel version and displays it on the console along with the location of the Ragel binary."
|
@@ -178,27 +188,7 @@ task :ragel_java => [:ragel_version] do
|
|
178
188
|
end
|
179
189
|
end
|
180
190
|
|
181
|
-
###
|
182
|
-
|
183
|
-
desc "Package up the Win32 distribution."
|
184
|
-
file WIN32_PKG_DIR => [:package] do
|
185
|
-
sh "tar zxf pkg/#{PKG}.tgz"
|
186
|
-
mv PKG, WIN32_PKG_DIR
|
187
|
-
end
|
188
|
-
|
189
|
-
desc "Build the binary RubyGems package for win32"
|
190
|
-
task :package_win32 => ["fast_xs_win32", "hpricot_scan_win32"] do
|
191
|
-
Dir.chdir("#{WIN32_PKG_DIR}") do
|
192
|
-
Gem::Builder.new(Win32Spec).build
|
193
|
-
verbose(true) {
|
194
|
-
mv Dir["*.gem"].first, "../pkg/"
|
195
|
-
}
|
196
|
-
end
|
197
|
-
end
|
198
|
-
|
199
|
-
CLEAN.include WIN32_PKG_DIR
|
200
|
-
|
201
|
-
### JRuby Packages ###
|
191
|
+
### JRuby Compile ###
|
202
192
|
|
203
193
|
def java_classpath_arg # myriad of ways to discover JRuby classpath
|
204
194
|
begin
|
@@ -211,7 +201,11 @@ def java_classpath_arg # myriad of ways to discover JRuby classpath
|
|
211
201
|
jruby_cpath = ENV['JRUBY_PARENT_CLASSPATH'] || ENV['JRUBY_HOME'] &&
|
212
202
|
FileList["#{ENV['JRUBY_HOME']}/lib/*.jar"].join(File::PATH_SEPARATOR)
|
213
203
|
end
|
214
|
-
jruby_cpath
|
204
|
+
unless jruby_cpath || ENV['CLASSPATH'] =~ /jruby/
|
205
|
+
abort %{WARNING: No JRuby classpath has been set up.
|
206
|
+
Define JRUBY_HOME=/path/to/jruby on the command line or in the environment}
|
207
|
+
end
|
208
|
+
"-cp \"#{jruby_cpath}\""
|
215
209
|
end
|
216
210
|
|
217
211
|
def compile_java(filenames, jarname)
|
@@ -231,42 +225,10 @@ task :fast_xs_java do
|
|
231
225
|
end
|
232
226
|
end
|
233
227
|
|
234
|
-
|
235
|
-
|
236
|
-
|
237
|
-
end
|
238
|
-
|
239
|
-
JRubySpec = SPEC.dup
|
240
|
-
JRubySpec.platform = 'java'
|
241
|
-
JRubySpec.files = PKG_FILES + ["lib/hpricot_scan.jar", "lib/fast_xs.jar"]
|
242
|
-
JRubySpec.extensions = []
|
243
|
-
|
244
|
-
JRUBY_PKG_DIR = "#{PKG}-java"
|
245
|
-
|
246
|
-
desc "Package up the JRuby distribution."
|
247
|
-
file JRUBY_PKG_DIR => [:ragel_java, :package] do
|
248
|
-
sh "tar zxf pkg/#{PKG}.tgz"
|
249
|
-
mv PKG, JRUBY_PKG_DIR
|
250
|
-
end
|
251
|
-
|
252
|
-
desc "Build the RubyGems package for JRuby"
|
253
|
-
task :package_jruby => JRUBY_PKG_DIR do
|
254
|
-
Dir.chdir("#{JRUBY_PKG_DIR}") do
|
255
|
-
Rake::Task[:compile_java].invoke
|
256
|
-
Gem::Builder.new(JRubySpec).build
|
257
|
-
verbose(true) {
|
258
|
-
mv Dir["*.gem"].first, "../pkg/#{JRUBY_PKG_DIR}.gem"
|
259
|
-
}
|
228
|
+
%w(hpricot_scan fast_xs).each do |ext|
|
229
|
+
file "lib/#{ext}.jar" => "#{ext}_java" do |t|
|
230
|
+
mv "ext/#{ext}/#{ext}.jar", "lib"
|
260
231
|
end
|
232
|
+
task :compile_java => "lib/#{ext}.jar"
|
261
233
|
end
|
262
234
|
|
263
|
-
CLEAN.include JRUBY_PKG_DIR
|
264
|
-
|
265
|
-
task :install do
|
266
|
-
sh %{rake package}
|
267
|
-
sh %{sudo gem install pkg/#{NAME}-#{VERS}}
|
268
|
-
end
|
269
|
-
|
270
|
-
task :uninstall => [:clean] do
|
271
|
-
sh %{sudo gem uninstall #{NAME}}
|
272
|
-
end
|