spider 0.4.2 → 0.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/CHANGES CHANGED
@@ -1,3 +1,6 @@
1
+ 2008-10-09
2
+ * fixed a situation with nested slashes in urls, thanks to Sander van der Vliet and John Buckley
3
+
1
4
  2008-07-06
2
5
  * Trap interrupts and shutdown gracefully
3
6
  * Support for custom urls-to-crawl objects
data/README CHANGED
@@ -1,3 +1,4 @@
1
+
1
2
  Spider, a Web spidering library for Ruby. It handles the robots.txt,
2
3
  scraping, collecting, and looping so that you can just handle the data.
3
4
 
@@ -132,9 +133,14 @@ scraping, collecting, and looping so that you can just handle the data.
132
133
  == Author
133
134
 
134
135
  John Nagro john.nagro@gmail.com
136
+
135
137
  Mike Burns http://mike-burns.com mike@mike-burns.com (original author)
136
138
 
137
- Help from Matt Horan, and Henri Cook.
139
+ Many thanks to:
140
+ Matt Horan
141
+ Henri Cook
142
+ Sander van der Vliet
143
+ John Buckley
138
144
 
139
145
  With `robot_rules' from James Edward Gray II via
140
146
  http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/177589
@@ -295,15 +295,17 @@ class SpiderInstance
295
295
  def construct_complete_url(base_url, additional_url, parsed_additional_url = nil) #:nodoc:
296
296
  parsed_additional_url ||= URI.parse(additional_url)
297
297
  case parsed_additional_url.scheme
298
- when nil
299
- u = base_url.is_a?(URI) ? base_url : URI.parse(base_url)
300
- if additional_url[0].chr == '/'
301
- "#{u.scheme}://#{u.host}:#{u.port}#{additional_url}"
302
- elsif u.path.nil? || u.path == ''
303
- "#{u.scheme}://#{u.host}:#{u.port}/#{additional_url}"
304
- else
305
- "#{u.scheme}://#{u.host}:#{u.port}/#{u.path}/#{additional_url}"
306
- end
298
+ when nil
299
+ u = base_url.is_a?(URI) ? base_url : URI.parse(base_url)
300
+ if additional_url[0].chr == '/'
301
+ "#{u.scheme}://#{u.host}#{additional_url}"
302
+ elsif u.path.nil? || u.path == ''
303
+ "#{u.scheme}://#{u.host}/#{additional_url}"
304
+ elsif u.path[0].chr == '/'
305
+ "#{u.scheme}://#{u.host}#{u.path}/#{additional_url}"
306
+ else
307
+ "#{u.scheme}://#{u.host}/#{u.path}/#{additional_url}"
308
+ end
307
309
  else
308
310
  additional_url
309
311
  end
@@ -14,5 +14,5 @@ spec = Gem::Specification.new do |s|
14
14
  A Web spidering library: handles robots.txt, scraping, finding more
15
15
  links, and doing it all over again.
16
16
  EOF
17
- s.version = '0.4.2'
17
+ s.version = '0.4.3'
18
18
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: spider
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.2
4
+ version: 0.4.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - John Nagro
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2008-07-06 00:00:00 -04:00
12
+ date: 2008-10-09 00:00:00 -04:00
13
13
  default_executable:
14
14
  dependencies: []
15
15