spider 0.4.2 → 0.4.3

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGES CHANGED
@@ -1,3 +1,6 @@
1
+ 2008-10-09
2
+ * fixed a situation with nested slashes in urls, thanks to Sander van der Vliet and John Buckley
3
+
1
4
  2008-07-06
2
5
  * Trap interrupts and shutdown gracefully
3
6
  * Support for custom urls-to-crawl objects
data/README CHANGED
@@ -1,3 +1,4 @@
1
+
1
2
  Spider, a Web spidering library for Ruby. It handles the robots.txt,
2
3
  scraping, collecting, and looping so that you can just handle the data.
3
4
 
@@ -132,9 +133,14 @@ scraping, collecting, and looping so that you can just handle the data.
132
133
  == Author
133
134
 
134
135
  John Nagro john.nagro@gmail.com
136
+
135
137
  Mike Burns http://mike-burns.com mike@mike-burns.com (original author)
136
138
 
137
- Help from Matt Horan, and Henri Cook.
139
+ Many thanks to:
140
+ Matt Horan
141
+ Henri Cook
142
+ Sander van der Vliet
143
+ John Buckley
138
144
 
139
145
  With `robot_rules' from James Edward Gray II via
140
146
  http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/177589
@@ -295,15 +295,17 @@ class SpiderInstance
295
295
  def construct_complete_url(base_url, additional_url, parsed_additional_url = nil) #:nodoc:
296
296
  parsed_additional_url ||= URI.parse(additional_url)
297
297
  case parsed_additional_url.scheme
298
- when nil
299
- u = base_url.is_a?(URI) ? base_url : URI.parse(base_url)
300
- if additional_url[0].chr == '/'
301
- "#{u.scheme}://#{u.host}:#{u.port}#{additional_url}"
302
- elsif u.path.nil? || u.path == ''
303
- "#{u.scheme}://#{u.host}:#{u.port}/#{additional_url}"
304
- else
305
- "#{u.scheme}://#{u.host}:#{u.port}/#{u.path}/#{additional_url}"
306
- end
298
+ when nil
299
+ u = base_url.is_a?(URI) ? base_url : URI.parse(base_url)
300
+ if additional_url[0].chr == '/'
301
+ "#{u.scheme}://#{u.host}#{additional_url}"
302
+ elsif u.path.nil? || u.path == ''
303
+ "#{u.scheme}://#{u.host}/#{additional_url}"
304
+ elsif u.path[0].chr == '/'
305
+ "#{u.scheme}://#{u.host}#{u.path}/#{additional_url}"
306
+ else
307
+ "#{u.scheme}://#{u.host}/#{u.path}/#{additional_url}"
308
+ end
307
309
  else
308
310
  additional_url
309
311
  end
@@ -14,5 +14,5 @@ spec = Gem::Specification.new do |s|
14
14
  A Web spidering library: handles robots.txt, scraping, finding more
15
15
  links, and doing it all over again.
16
16
  EOF
17
- s.version = '0.4.2'
17
+ s.version = '0.4.3'
18
18
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: spider
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.2
4
+ version: 0.4.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - John Nagro
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2008-07-06 00:00:00 -04:00
12
+ date: 2008-10-09 00:00:00 -04:00
13
13
  default_executable:
14
14
  dependencies: []
15
15